Cross-Validation of SWOT Satellite Altimetry with In-Situ Measurements

This notebook presents a comprehensive framework for validating Surface Water and Ocean Topography (SWOT) satellite altimetry data using in-situ measurements from the BioSWOT-Med campaign in the Northwestern Mediterranean Sea. The SWOT mission, launched in December 2022 as a collaboration between NASA and CNES, represents a significant advancement in oceanographic remote sensing with its innovative wide-swath interferometric altimetry technology.

The BioSWOT-Med campaign, conducted in the Northwestern Mediterranean Sea, employed a Moving Vessel Profiler (MVP) to collect high-resolution temperature and salinity profiles along multiple transects. These measurements coincided with SWOT overpasses, creating an ideal dataset for satellite validation. The dynamic height calculated from MVP density profiles provides a direct in-situ counterpart to SWOT's satellite-measured sea surface height.

The in-situ MVP data used in this analysis was acquired by the Mediterranean Institute of Oceanography (MIO) in Marseille, France. This notebook represents the results of my research internship at MIO and consists of four complementary code sections, each addressing specific aspects of SWOT validation. While you can navigate directly to any section of interest, reading them in sequence is recommended for a comprehensive understanding of the validation framework:

  1. Validation Pipeline: Implements the complete processing chain from raw data to comparative visualization, producing maps and profiles comparing MVP dynamic height with SWOT Absolute Dynamic Topography (ADT). This section answers: How well do SWOT-derived sea surface height and geostrophic velocities match the in-situ measurements?

  2. Reference Level Optimization: Systematically tests different reference depths for dynamic height calculations to identify the optimal "level of no motion." This section addresses: What reference depth yields the best agreement between MVP dynamic height and SWOT altimetry?

  3. Statistical Validation: Provides quantitative metrics (correlation, RMSE, regression slope) to assess the improvement achieved with optimized reference levels. This section answers: How does optimizing the reference level improve the agreement between MVP and SWOT measurements?

  4. Spectral Analysis: Examines the power spectral density and coherence of MVP and satellite measurements across different spatial scales. This section addresses: At what spatial scales does SWOT accurately capture oceanographic features, and how does it compare to conventional altimetry?

All visualizations are generated at the end of each code section. For a more detailed discussion and interpretation of results, please refer to the complete thesis available on my GitHub repository.

The validation methods presented here are applicable to other regions and datasets, providing a robust framework for assessing satellite altimetry performance. By understanding the strengths and limitations of SWOT measurements across different spatial scales, particularly in the mesoscale-to-submesoscale transition region, we can better leverage this revolutionary altimetry technology for oceanographic research and applications.

1º Code - SWOT Satellite Altimetry Validation: Processing and Diagnostic Pipeline¶

This first section of the notebook implements a comprehensive workflow for validating Surface Water and Ocean Topography (SWOT) satellite altimetry data against in-situ measurements collected by the Moving Vessel Profiler (MVP).

This validation code:

  1. Loads and processes both satellite data (SWOT and DUACS) and in-situ MVP measurements
  2. Computes dynamic height from MVP temperature and salinity profiles
  3. Derives geostrophic velocities from both satellite and in-situ data sources
  4. Creates visualizations comparing sea surface height and velocity fields
  5. Calculates statistics to quantify agreement between different measurement systems

The analysis focuses on multiple "PL" (plongée) transects, where the research vessel collected MVP profiles coincident with SWOT overpasses. For each transect, the code generates a suite of diagnostic visualizations that examine both the absolute and relative agreement between satellite and in-situ observations.

Understanding the correspondence between these different measurement systems is crucial for assessing SWOT's capability to resolve fine-scale oceanographic features like fronts and eddies, which are important drivers of ocean dynamics and marine ecosystems.

1. Import libraries¶

1.1 Scientific Libraries and Computational Framework¶

The first section of our code imports essential scientific computing libraries that provide the foundation for oceanographic data analysis:

  • gsw (Gibbs SeaWater): A Python implementation of the Thermodynamic Equation of Seawater 2010 (TEOS-10). This toolbox is crucial for accurately calculating thermodynamic properties of seawater, particularly density and derived quantities like dynamic height and geostrophic velocity.

  • numpy: The fundamental package for scientific computing in Python, which provides support for large multi-dimensional arrays, matrices, and mathematical functions to operate on these arrays.

  • scipy.io: A module from SciPy that provides functions for working with various file formats. In our context, it is primarily used for loading MATLAB (.mat) files containing MVP data through the loadmat() function.

These core libraries establish the computational foundation for our oceanographic analysis. The Gibbs SeaWater (gsw) toolbox is particularly important as it implements the internationally adopted TEOS-10 standard, which has replaced the previous EOS-80 equation of state in oceanographic practice. This ensures that our calculations of density, dynamic height, and geostrophic velocities adhere to the most accurate thermodynamic representation of seawater properties.

In oceanographic studies, the precise calculation of density from temperature and salinity measurements is critical, as even small errors in density can propagate into significant errors in derived quantities like geostrophic currents. The gsw library provides functions that account for the spatial variations in the composition of seawater, offering improvements over older methods that assumed constant composition.

In [ ]:
## 1.1 Core scientific computing
import gsw                          # Gibbs Seawater toolbox
import numpy as np
import scipy.io as sio

1.2 Geospatial Processing Libraries¶

Oceanographic data analysis requires specialized tools for handling geographic coordinates, calculating distances on Earth's surface, and creating map projections. The following libraries support these geospatial operations:

  • geopy.distance: A module from the GeoPy package that provides methods for calculating geodesic distances between points on Earth's surface. Unlike simple Euclidean distance calculations, geodesic distances account for Earth's curvature, which is essential for accurate distance measurements, especially over longer transects. In our analysis, this library will be used to calculate the cumulative distance along MVP sampling tracks.

  • mpl_toolkits.basemap: A toolkit for plotting data on map projections in matplotlib. Basemap transforms coordinates from geographic (longitude, latitude) to map projections (x, y), enabling the creation of publication-quality maps with customizable features such as coastlines, political boundaries, and filled continents. While newer libraries like Cartopy are gradually replacing Basemap, it remains a robust tool for creating oceanographic maps.

Together, these geospatial libraries enable precise distance calculations along ship transects and effective visualization of oceanographic data in their proper geographical context. These capabilities are particularly important for our analysis, as we need to accurately represent the spatial relationships between satellite data and in-situ measurements across the Northwestern Mediterranean basin.

In [ ]:
## 1.2 Geospatial processing
import geopy.distance
from mpl_toolkits.basemap import Basemap

1.3 Visualization Tools¶

Effective data visualization is critical for oceanographic analysis, allowing us to interpret complex spatial patterns and compare different datasets. The core visualization library imported in this section is:

  • matplotlib.pyplot: The most widely used plotting library in the Python scientific ecosystem, providing a MATLAB-like interface for creating a wide variety of static, animated, and interactive visualizations. In our analysis, matplotlib (imported as plt) will be used to generate multiple plot types including:

    • Geographic maps displaying surface currents and sea surface height anomalies
    • Vector field plots (quiver plots) to represent geostrophic velocities
    • Line plots for comparing dynamic height profiles along transects
    • Scatter plots for point-wise data comparison
    • Multi-panel figures for comprehensive analysis of different variables

Matplotlib offers extensive customization options that will allow us to create clear, informative visualizations that highlight the key oceanographic features of interest. We'll use specific color schemes (defined later in the code) to consistently represent different data sources: MVP measurements (red), SWOT-derived ADT velocities (blue), SWOT total velocities (green), and SWOT projected velocities (dark green).

Through these visualizations, we'll be able to qualitatively and quantitatively assess the agreement between satellite and in-situ observations, identify oceanographic features like eddies and fronts, and evaluate the performance of the SWOT mission in capturing fine-scale ocean dynamics.

In [ ]:
## 1.3 Visualisation tools
import matplotlib.pyplot as plt

1.4 Data Handling and I/O Libraries¶

Oceanographic datasets come in various formats, each requiring specific libraries for effective handling. This section imports essential tools for data manipulation, file operations, and time series management:

  • pandas: A powerful data analysis library providing data structures like DataFrames that are ideal for tabular data manipulation. In our workflow, pandas will be used to organize MVP profiles, compute rolling averages for smoothing, and manage geospatial data along transects.

  • glob: A module that provides Unix-style pathname pattern expansion, allowing us to find files matching specific patterns. This is particularly useful for locating SWOT and DUACS data files that follow standard naming conventions but have variable date components.

  • os: The operating system interface module that provides functions for interacting with the file system, helping us manipulate file paths and extract file information.

  • datetime and timedelta: Classes from the standard library for manipulating dates and times. These are essential for handling the temporal aspects of oceanographic data, particularly for matching MVP transects with corresponding satellite overpasses.

  • xarray: A library designed for working with labeled, multi-dimensional arrays, particularly suited for handling NetCDF files common in geoscience. Xarray combines the capabilities of pandas and NumPy with the self-describing nature of NetCDF files, making it excellent for working with gridded satellite data.

  • netCDF4: A Python interface to the NetCDF C library, providing direct access to NetCDF files. While xarray builds on netCDF4 to provide a higher-level interface, this module gives us lower-level access when needed.

Together, these libraries provide a comprehensive toolkit for handling the diverse data formats and structures encountered in oceanographic analysis. They facilitate the integration of disparate data sources—from ship-based instruments to satellite observations—into a coherent analytical framework.

In [ ]:
## 1.4 Data handling and I/O
import pandas as pd
import glob
import os
from datetime import datetime, timedelta
import xarray as xr
import netCDF4 as nc

1.5 Interpolation Utilities¶

Oceanographic measurements often come from irregularly spaced sampling points, while analysis and comparison require data on regular grids or at specific locations. The following interpolation and spatial analysis tools address this challenge:

  • scipy.interpolate.griddata: A versatile function for interpolating scattered data onto a regular grid or at arbitrary query points. This tool is crucial for our workflow as it allows us to:

    • Interpolate satellite measurements (on a fixed grid) to MVP sampling points
    • Convert irregularly spaced MVP measurements to regular intervals along transects
    • Generate continuous fields from discrete observations

    Griddata supports multiple interpolation methods, including nearest-neighbor, linear, and cubic spline techniques, allowing us to select the appropriate method based on data density and the physical characteristics of the field being interpolated.

  • scipy.spatial.cKDTree: An optimized k-dimensional tree implementation that efficiently finds nearest neighbors in multi-dimensional space. In our analysis, this data structure accelerates the process of locating the closest satellite data points to each MVP measurement location, significantly improving computational performance when working with large satellite datasets.

  • scipy.spatial.ConvexHull: A computational geometry tool that calculates the smallest convex set containing a given set of points. We'll use this to delineate frontal zones in the Northwestern Mediterranean, creating polygon boundaries that can be overlaid on our maps to highlight oceanographically significant regions.

These spatial interpolation and analysis utilities bridge the gap between different sampling strategies, enabling direct comparison between satellite and in-situ measurements despite their inherently different spatial resolutions and sampling geometries.

In [ ]:
## 1.5 Interpolation utilities
from scipy.interpolate import griddata
from scipy.spatial import cKDTree, ConvexHull

1.6 Pattern Matching and Progress Monitoring¶

When processing multiple oceanographic datasets and running computationally intensive analyses, efficient file management and execution tracking become essential. This section imports utilities that improve code functionality and user experience:

  • re: The regular expression module from the Python standard library provides sophisticated pattern matching capabilities. In our workflow, regular expressions are particularly useful for:

    • Extracting date information from standardized SWOT and DUACS filename formats
    • Parsing text fields that contain structured information
    • Filtering files based on complex naming patterns

    For example, we'll use pattern matching to identify SWOT files corresponding to specific oceanographic survey dates, ensuring proper temporal alignment between satellite and in-situ observations.

  • tqdm: A fast, extensible progress bar library that provides visual feedback during long-running operations. When processing multiple MVP transects (referred to as PLs or "plongées" in the code), tqdm will display a progress bar showing:

    • The current operation status
    • Estimated completion time
    • Percentage of tasks completed

    This feedback is invaluable when running analyses that might take significant time, particularly when processing the full dataset of 20 transects (PL5 through PL24).

These utilities enhance both the functionality and usability of our analysis pipeline, making it easier to manage complex file structures and monitor the progress of computationally intensive operations.

In [ ]:
## 1.6 Pattern matching and progress monitoring
import re
from tqdm import tqdm

1.7 Statistical Diagnostics¶

Robust validation of satellite altimetry data against in-situ measurements requires rigorous statistical analysis to quantify agreement, identify biases, and assess uncertainties. To support these statistical evaluations, we import:

  • statsmodels.api: A comprehensive library providing classes and functions for estimating statistical models, conducting statistical tests, and exploring data. In our oceanographic validation context, statsmodels offers several key capabilities:

    • Regression analysis tools to quantify relationships between MVP and SWOT measurements
    • Diagnostic plots for visualizing residuals and identifying potential systematic errors
    • Statistical tests to evaluate the significance of observed differences
    • Time series analysis functions for examining temporal patterns in the data

    The comment "# diagnostic plots" indicates that we'll primarily use statsmodels for creating visual diagnostics to assess the quality of our data and statistical models.

Statistical analysis is particularly important in our SWOT validation workflow as it provides quantitative metrics of agreement between different observation systems, helping to characterize the accuracy and precision of satellite-derived oceanographic variables relative to direct in-situ measurements. These metrics will ultimately help determine whether SWOT data can reliably capture fine-scale oceanographic features in the Northwestern Mediterranean.

In [ ]:
## 1.7 Statistical diagnostics
import statsmodels.api as sm        # diagnostic plots

1.8 Color Palette for Plots¶

Consistent visual encoding is crucial for effective data presentation, especially when comparing multiple data sources across numerous figures. This section defines a standardized color scheme for the different velocity measurements:

  • swot_color_adt = "blue": Represents velocities derived from SWOT Absolute Dynamic Topography (ADT) gradients. These velocities are calculated by applying geostrophic balance equations to the sea surface height field measured by the satellite.

  • swot_color_total = "green": Denotes the total geostrophic velocities provided directly in the SWOT data products (ugos, vgos). These may incorporate additional processing steps beyond simple ADT gradient calculations.

  • swot_color_proj = "#006400" (dark green): Used for SWOT velocities that have been projected onto directions perpendicular to the MVP transect. This projection facilitates direct comparison with MVP-derived velocities, which are inherently perpendicular to the ship's track.

  • mvp_color = "red": Identifies velocities calculated from the Moving Vessel Profiler (MVP) data. These in-situ measurements serve as our reference dataset for validating the satellite observations.

This consistent color coding will be applied throughout all visualizations, making it easier to track different data sources across multiple plots and facilitating the interpretation of comparative analyses. The use of distinct, contrasting colors ensures that velocity vectors from different sources remain visually distinguishable even when overlaid on the same map or graph.

In [ ]:
# 1.8 Colour palette for plots
swot_color_adt   = "blue"     # SWOT velocities derived from ADT
swot_color_total = "green"    # SWOT total velocities
swot_color_proj  = "#006400"  # SWOT projected velocities
mvp_color        = "red"      # MVP velocities

2. Utility Function Definitions¶

This section contains various utility functions that form the backbone of our oceanographic data processing pipeline. These functions handle common operations such as loading satellite data, converting time formats, and manipulating vector fields.

2.1 File Loading and Preprocessing¶

2.1.1 Generic Loader for SSH and Velocity Fields¶

The loadsshuv function provides a flexible interface for loading sea surface height (SSH) and velocity fields from NetCDF files, with special handling for SWOT data. This centralized loading function ensures consistent data processing across different satellite products.

loadsshuv(fname, rep, varn, **kwargs)

Parameters:

  • fname: Filename pattern to load
  • rep: Repository (directory) where the file is located
  • varn: Dictionary mapping standard variable names to dataset-specific names
  • **kwargs: Additional options including 'type' (data source) and 'area' (geographic bounds)

Key oceanographic variables extracted:

  1. SSH (Sea Surface Height): The height of the ocean surface relative to a reference ellipsoid
  2. SSHA (Sea Surface Height Anomaly): Deviation from the mean sea level
  3. MSS (Mean Sea Surface): Long-term average sea surface height
  4. MDT (Mean Dynamic Topography): Mean component of sea surface height due to ocean circulation
  5. ADT (Absolute Dynamic Topography): Total dynamic sea surface height, computed as:
$$\text{ADT} = \text{SSHA} + \text{MDT}$$

The function implements specific processing for SWOT data, including:

  • Geographic subsetting using the area parameter to focus on a specific region
  • Conversion of xarray DataArrays to NumPy masked arrays via to_masked_array()
  • Removal of rows containing only NaN values, which improves computational efficiency
  • Calculation of ADT from SSHA and MDT

For non-SWOT data (e.g., DUACS), the function uses a more generic approach, extracting variables based on the names provided in the varn dictionary.

The result is a standardized dictionary containing all necessary variables for oceanographic analysis, regardless of the input data source. This abstraction layer simplifies downstream code by providing a consistent interface to different altimetry products.

In [ ]:
### 2.1.1 Generic loader for SSH and velocity fields
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load lon/lat, SSH, U, V and ADT fields from a NetCDF file,
    with optional geographic sub-setting.
    """
    if 'type' in kwargs:
        if kwargs['type'] == 'swot':
            filei2 = glob.glob(rep + fname)[0]
            ds = xr.open_dataset(filei2)
            if 'area' in kwargs:
                area = kwargs['area']
                selection = ((ds.longitude > area[0]) &
                             (ds.longitude < area[2]) &
                             (ds.latitude  > area[1]) &
                             (ds.latitude  < area[3]))
                ds = ds.where(selection)

            lons = ds.longitude.to_masked_array()
            lats = ds.latitude.to_masked_array()
            us   = ds.ugos.to_masked_array()
            vs   = ds.vgos.to_masked_array()
            ssha = ds.ssha_filtered.to_masked_array()
            mss  = ds.mss.to_masked_array()
            mdt  = ds.mdt.to_masked_array()

            # Remove rows that contain only NaNs
            mask_row = np.all(np.isnan(us), axis=1)
            u    = us[~mask_row, :]
            v    = vs[~mask_row, :]
            lon  = lons[~mask_row, :]
            lat  = lats[~mask_row, :]
            ssha = ssha[~mask_row, :]
            mss  = mss[~mask_row, :]
            mdt  = mdt[~mask_row, :]

            # Absolute Dynamic Topography (ADT)
            adt = ssha + mdt

        else:
            filei = glob.glob(rep + fname)[0]
            ds = xr.open_dataset(filei)
            lon  = ds[varn['longitude']][:]
            lat  = ds[varn['latitude']][:]
            ssha = ds[varn['ssh']][:]
            adt  = ds['adt'][:]
            u    = ds[varn['u']][:]
            v    = ds[varn['v']][:]
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon  = ds[varn['longitude']][:]
        lat  = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt  = ds['adt'][:]
        u    = ds[varn['u']][:]
        v    = ds[varn['v']][:]  
    return {
        'lon' : lon,
        'lat' : lat,
        'ssh' : ssha,
        'u'   : u,
        'v'   : v,
        'ssha': ssha,
        'adt' : adt
    }

2.1.2 Julian-day to Calendar Conversion¶

The jj2date function converts Julian day numbers into standard calendar dates with time components, a crucial operation when working with oceanographic datasets that use different time representations.

jj2date(data, origin)

Parameters:

  • data: Array of Julian day values (can be fractional)
  • origin: Reference date corresponding to Julian day 0

Julian days provide a continuous count of days and fractions of days since a reference epoch, making them convenient for calculating time intervals in scientific applications. However, human-readable calendar dates (YYYY-MM-DD hh:mm:ss) are more intuitive for data analysis and reporting.

The conversion process involves two main steps:

  1. Integer portion conversion: The whole number part of each Julian day is added to the origin date using pandas' to_timedelta function to calculate the calendar date.

  2. Fractional portion conversion: The decimal part of each Julian day is converted to hours, minutes, and seconds through the following calculations:

    • Hours = (fractional part) × 24
    • Minutes = (fractional hours) × 60
    • Seconds = (fractional minutes) × 60

For example, if we have a Julian day value of 43.75 with origin '2000-01-01':

  • The date would be 2000-01-01 + 43 days = 2000-02-13
  • The time would be 0.75 × 24 = 18 hours, 0 minutes, 0 seconds
  • The full timestamp would be '2000-02-13 18:00:00'

This function is particularly useful when processing MVP data that might store time information in Julian day format but needs to be aligned with SWOT satellite pass times, which are typically recorded in calendar dates.

In [ ]:
### 2.1.2 Julian-day to calendar conversion
def jj2date(data, origin):
    """
    Convert fractional Julian days into calendar dates (YYYY-MM-DD hh:mm:ss).
    """
    data   = np.asarray(data)
    origin = pd.to_datetime(origin)
    times  = []
    for d in data:
        day = (origin + pd.to_timedelta(d, unit='D')).strftime('%Y-%m-%d')
        h   = int((d % 1) * 24)
        m   = int(((d % 1) * 24 % 1) * 60)
        s   = int((((d % 1) * 24 % 1) * 60 % 1) * 60)
        times.append(f"{day} {h:02d}:{m:02d}:{s:02d}")
    return np.array(times)

2.1.3 Recursive Flattening of Nested Structures¶

The extract_values function implements a recursive algorithm to extract individual values from deeply nested data structures, which is particularly useful when working with MATLAB files converted to Python.

extract_values(nested_array)

Parameter:

  • nested_array: A complex, potentially multi-level nested structure of arrays, lists, or tuples

MATLAB files (.mat) loaded via scipy.io.loadmat() often contain deeply nested arrays with non-intuitive structures. This complexity arises from the translation between MATLAB's matrix-oriented data storage and Python's object hierarchy. For example, a simple MATLAB scalar can become a nested NumPy array when loaded in Python.

The function handles three main cases:

  1. NumPy arrays:

    • If the array has object dtype ('O'), it likely contains other arrays or objects, so we recurse into it
    • Otherwise, we extract the single value using .item()
  2. Tuples:

    • Iteratively recurse into each element of the tuple
  3. Other values (including basic Python types):

    • Add directly to the output list

The recursion strategy uses an inner function recurse() that modifies the values list in the outer scope, accumulating all primitive values encountered during traversal. This approach efficiently flattens structures of arbitrary depth and complexity.

In our workflow, this function is particularly valuable when extracting date information or metadata from MVP .mat files, where the nested structure might otherwise make direct access challenging. By flattening these complex structures, we can more easily obtain the values needed for subsequent processing steps.

In [ ]:
### 2.1.3 Recursive flattening of nested structures
def extract_values(nested_array):
    """
    Flatten arbitrarily nested arrays, tuples or lists into a single list.
    """
    values = []
    def recurse(arr):
        for element in arr:
            if isinstance(element, np.ndarray):
                if element.dtype == 'O':
                    recurse(element)
                else:
                    values.append(element.item())
            elif isinstance(element, tuple):
                for item in element:
                    recurse(item)
            else:
                values.append(element)
    recurse(nested_array)
    return values

2.2 Vector Manipulation Utilities¶

2.2.1 Planar Rotation¶

The rotate_vector function performs a standard 2D rotation of velocity vectors, which is essential for aligning velocities measured in different reference frames.

rotate_vector(u, v, angle_deg)

Parameters:

  • u: East-west (zonal) component of the vector
  • v: North-south (meridional) component of the vector
  • angle_deg: Rotation angle in degrees (positive for counterclockwise rotation)

Vector rotation is mathematically described by the transformation:

$$ \begin{bmatrix} u_{rot} \\ v_{rot} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} $$

Where $\theta$ is the rotation angle in radians.

This function is particularly useful when we need to:

  1. Convert between geographic and local coordinate systems
  2. Align velocity vectors with the direction of a ship track
  3. Compute cross-track and along-track components of velocity

In oceanographic applications, rotations are often needed when comparing velocity fields from different sources. For example, MVP-derived velocities are naturally oriented perpendicular to the ship's track, while SWOT velocities are provided in standard east-north coordinates. Rotating these vectors to a common reference frame facilitates direct comparison.

The function first converts the input angle from degrees to radians using NumPy's deg2rad function, then applies the standard rotation matrix operations to compute the rotated components. The result is returned as a tuple containing the rotated u and v components.

In [ ]:
### 2.2.1 Planar rotation
def rotate_vector(u, v, angle_deg):
    """
    Rotate the vector (u, v) by angle_deg degrees.
    """
    angle_rad = np.deg2rad(angle_deg)
    u_rot = u * np.cos(angle_rad) - v * np.sin(angle_rad)
    v_rot = u * np.sin(angle_rad) + v * np.cos(angle_rad)
    return u_rot, v_rot

2.2.2 Moving Average (Centred Window)¶

The moving_average function implements a centered moving average filter that smooths time series or spatial data by averaging values within a sliding window.

moving_average(x, interval)

Parameters:

  • x: Input array to be smoothed
  • interval: Window size for averaging (must be odd to ensure symmetry)

Moving averages are fundamental tools in oceanographic data processing for:

  1. Removing high-frequency noise from measurements
  2. Highlighting lower-frequency signals of interest
  3. Reducing the impact of outliers or instrument errors

For a given point in the input array, the centered moving average computes the mean of all values within a window of size interval centered on that point. Mathematically, this is expressed as:

$$\bar{x}_i = \frac{1}{interval} \sum_{j=i-start}^{i+start} x_j$$

where $start = \frac{interval-1}{2}$ (integer division)

The implementation first calculates the starting offset (start) to ensure the window is properly centered. Then, for each valid position in the array (excluding points too close to the edges), it computes the mean of all values within the window.

Key characteristics of this implementation:

  • The window is always centered on the current point
  • Points at the beginning and end of the array (within start positions of the edges) are excluded from the output
  • The length of the returned array is len(x) - 2*start, which is shorter than the input array

In our oceanographic context, this function can be used to smooth MVP or satellite data spatially, reducing small-scale variations to focus on the larger oceanographic features of interest, such as fronts and eddies.

In [ ]:
### 2.2.2 Moving average (centred window)
def moving_average(x, interval):
    """
    Compute a moving average of x over a centred window of length = interval.
    """
    start = (interval - 1) // 2
    liste_moyennes = []
    for i in range(start, len(x) - start):
        liste_moyennes.append(np.mean(x[i - start:i + start + 1]))
    return liste_moyennes

2.2.3 Projection of One Vector onto Another¶

The project_vector function computes the vector projection of one 2D vector onto another, which is essential for comparing velocity components along specific directions.

project_vector(uA, vA, uB, vB)

Parameters:

  • uA, vA: Components of the vector to be projected
  • uB, vB: Components of the direction vector to project onto

Vector projection is a fundamental operation in vector calculus and has important applications in oceanographic data analysis. When comparing velocities from different sources (e.g., SWOT vs. MVP), we often need to examine how much of one velocity field aligns with another.

The mathematical process involves three steps:

  1. Normalization of the direction vector (uB, vB) to create a unit vector:

    $$\hat{\mathbf{B}} = \frac{\mathbf{B}}{|\mathbf{B}|} = \frac{(u_B, v_B)}{\sqrt{u_B^2 + v_B^2}}$$

  2. Scalar projection of vector A onto the unit vector B̂:

    $$A_{proj} = \mathbf{A} \cdot \hat{\mathbf{B}} = u_A \cdot \hat{u}_B + v_A \cdot \hat{v}_B$$

  3. Vector projection calculation:

    $$\mathbf{A}_{proj} = A_{proj} \cdot \hat{\mathbf{B}} = A_{proj} \cdot (\hat{u}_B, \hat{v}_B)$$

The function includes a safety check for when vector B has zero magnitude, in which case it returns a zero vector to avoid division by zero.

In our oceanographic context, this function is particularly useful for:

  • Projecting SWOT velocity vectors onto the direction perpendicular to MVP transects
  • Calculating cross-frontal velocity components
  • Decomposing flow fields into along-track and cross-track components

This enables direct comparison between satellite-derived velocities and in-situ measurements, even when they're originally represented in different coordinate systems.

In [ ]:
### 2.2.3 Projection of one vector onto another
def project_vector(uA, vA, uB, vB):
    """
    Project vector (uA, vA) onto the direction of (uB, vB).
    """
    normB = np.sqrt(uB**2 + vB**2)
    if normB == 0:
        return 0., 0.
    uB_hat, vB_hat = uB / normB, vB / normB
    scalar_proj = uA*uB_hat + vA*vB_hat
    return scalar_proj * uB_hat, scalar_proj * vB_hat

2.2.4 Date Extraction from SWOT Filenames¶

The extract_key_swot function implements a pattern-matching approach to extract acquisition dates from SWOT satellite data filenames, which is essential for temporal alignment with MVP measurements.

extract_key_swot(filename)

Parameter:

  • filename: String containing the full SWOT data filename

SWOT filenames follow a standardized naming convention that includes a timestamp in the format YYYYMMDDTHHMMSS, where:

  • YYYYMMDD represents the date (year, month, day)
  • T is a literal separator
  • HHMMSS represents the time (hours, minutes, seconds)

The function uses a regular expression pattern r'\d{8}T\d{6}' to identify this timestamp within the filename:

  • \d{8} matches exactly 8 digits (the date portion)
  • T matches the literal character 'T'
  • \d{6} matches exactly 6 digits (the time portion)

Once the timestamp is extracted, the function:

  1. Takes the first 8 characters (the date portion)
  2. Parses it using datetime.strptime with the format code '%Y%m%d'
  3. Converts it to a date object using .date()

If no matching timestamp is found in the filename, the function returns None.

This function serves as a key sorting criterion when organizing SWOT data files chronologically, which is necessary for matching satellite passes with the corresponding MVP transects. It ensures that we can efficiently identify which SWOT observations correspond to the same day as a particular MVP survey, enabling direct comparison between the two datasets.

In [ ]:
### 2.2.4 Date extraction from SWOT filenames
def extract_key_swot(filename):
    match = re.search(r'\d{8}T\d{6}', filename)
    if match:
        return datetime.strptime(match.group(0)[:8], '%Y%m%d').date()
    return None

3. Satellite and Auxiliary Data Handling¶

3.1 SWOT and DUACS Co-location¶

3.1.1 SWOT Interpolation onto MVP Positions¶

The load_SWOT function handles the crucial task of spatially co-locating SWOT satellite altimetry data with MVP in-situ measurement positions, which is essential for direct comparison between the two datasets.

load_SWOT(mvp, dayv, swotdir, savefig, **kwargs)

Parameters:

  • mvp: Dictionary containing MVP data, including longitude and latitude positions
  • dayv: Date string in the format "YYYY-MM-DD" for the desired SWOT pass
  • swotdir: Directory path where SWOT files are stored
  • savefig: Directory for saving figures (not used in this function but maintained for interface consistency)
  • **kwargs: Additional parameters (not used in this function)

The function performs several key operations:

  1. Geographic domain determination: First, it defines a rectangular region around the MVP transect, with a 0.5° buffer in all directions:

    domain = [min_longitude - 0.5, min_latitude - 0.5, 
              max_longitude + 0.5, max_latitude + 0.5]
    
  2. SWOT file identification: It uses pattern matching to find the appropriate SWOT file for the specified date:

    fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' + 
                      dayv_dt.strftime('%Y%m%d') + '*_*.nc')
    
  3. Data loading: The function calls loadsshuv to extract SSH and velocity fields from the identified file, filtering by the calculated geographic domain.

  4. Spatial interpolation: Finally, it uses scipy.interpolate.griddata to interpolate the SWOT Absolute Dynamic Topography (ADT) field onto the exact MVP sampling locations using linear interpolation:

    adtc = griddata(points, valuesadt, (mvp['lon'], mvp['lat']), method='linear')
    

This interpolation step is critical for direct point-by-point comparison between satellite and in-situ measurements. Since SWOT sampling points rarely coincide exactly with MVP measurement locations, interpolation provides values at the precise coordinates where MVP data was collected.

The function returns the interpolated ADT values along the MVP transect, allowing for subsequent comparative analysis between the satellite-derived sea surface height and the dynamic height calculated from MVP temperature and salinity profiles.

In [ ]:
### 3.1.1 SWOT interpolation onto MVP positions
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
    """
    Load SWOT data and interpolate ADT onto MVP sampling points.
    """
    domain = [np.floor(np.min(mvp['lon'])) - 0.5,
              np.floor(np.min(mvp['lat'])) - 0.5,
              np.ceil(np.max(mvp['lon'])) + 0.5,
              np.ceil(np.max(mvp['lat'])) + 0.5]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
    fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
                      dayv_dt.strftime('%Y%m%d') + '*_*.nc')
    if not fname:
        print(f"No SWOT file found in load_SWOT for date {dayv}")
        return None
    swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
    points = np.array((swot['lon'].flatten(), swot['lat'].flatten())).T
    valuesadt = swot['adt'].flatten()
    adtc = griddata(points, valuesadt,
                    (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtc)

3.1.2 DUACS Interpolation onto MVP Positions¶

The load_DUACS function performs a similar co-location process as load_SWOT, but specifically for DUACS (Data Unification and Altimeter Combination System) altimetry products, which provide merged data from multiple satellite missions.

load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs)

Parameters:

  • mvp: Dictionary containing MVP data including longitude and latitude
  • dayv: Date string in "YYYY-MM-DD" format
  • duacsdir: Directory containing DUACS data files
  • savefig: Directory for saving figures (not used but maintained for consistency)
  • **kwargs: Additional parameters (not used)

DUACS represents an important reference dataset in our validation framework because it:

  1. Combines observations from multiple altimetry missions (Jason, Sentinel-3, etc.)
  2. Applies sophisticated mapping techniques to provide a continuous field
  3. Represents the current operational standard for altimetry products

The function differs from load_SWOT in several notable ways:

  1. No explicit domain specification: Unlike load_SWOT, this function does not subset the data spatially. DUACS files typically cover predefined regions, so the entire file is loaded.

  2. Different file naming convention: DUACS follows its own naming pattern:

    'nrt_europe_allsat_phy_l4_' + date + '_*.nc'
    

    where 'nrt' stands for Near-Real-Time and 'l4' indicates Level-4 processing (gridded, gap-free fields).

  3. Different variable mapping: The variable names dictionary maps 'ssh' to 'sla' (Sea Level Anomaly) rather than 'ssha_filtered' as in SWOT.

  4. Mesh grid creation: DUACS data typically comes on a regular latitude-longitude grid, requiring creation of a mesh grid before flattening for interpolation:

    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    

The interpolation process is otherwise identical to that used for SWOT data, employing linear interpolation to estimate ADT values at MVP sampling locations.

By providing both SWOT and DUACS interpolated fields at MVP points, we can perform a three-way comparison that helps identify which differences might be specific to SWOT and which are common to satellite altimetry in general.

In [ ]:
### 3.1.2 DUACS interpolation onto MVP positions
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
    """
    Load DUACS data and interpolate ADT onto MVP sampling points.
    """
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
    fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
                      dayv_dt.strftime('%Y%m%d') + '_*.nc')
    if not fname:
        print(f"No DUACS file found in load_DUACS for date {dayv}")
        return None
    duacs = loadsshuv(fname[0], '', varn)
    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    points = np.array((X.flatten(), Y.flatten())).T
    valuesadt = duacs['adt'].values.flatten()
    adtd = griddata(points, valuesadt,
                    (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtd)

3.2 SWOT File Catalogue and Acquisition Dates¶

This code block creates a comprehensive inventory of available SWOT data files and extracts their acquisition dates, which is essential for temporal matching with MVP surveys.

The process consists of three main steps:

  1. File discovery: Using the glob module, all SWOT Level-3 Low Resolution (LR) Sea Surface Height (SSH) Expert files are located in the specified directory:

    swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
    

    These files contain processed SWOT data with various corrections already applied, making them suitable for scientific analysis.

  2. Chronological sorting: The files are sorted by their acquisition dates using the previously defined extract_key_swot function as the sorting key:

    swot_files.sort(key=extract_key_swot)
    

    This ensures that the file list is organized chronologically, which simplifies subsequent temporal matching operations.

  3. Date extraction: For each file, the acquisition date is extracted from a specific position in the filename (the 8th element after splitting by underscores) and converted to a datetime object:

    date_strs = os.path.splitext(filename)[0].split('_')[7:8]
    for date_str in date_strs:
        dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())
    

The resulting dates_swot list contains the acquisition dates of all available SWOT passes, which will be used later to find the SWOT observations that temporally coincide with each MVP transect.

This temporal alignment is crucial for meaningful validation, as oceanographic conditions can change rapidly. Ideally, the satellite and in-situ measurements should be acquired as close in time as possible to ensure they're observing the same oceanic state.

In [ ]:
## 3.2 SWOT file catalogue and acquisition dates
swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
                       'SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
swot_files.sort(key=extract_key_swot)
dates_swot = []
for files in swot_files:
    filename  = os.path.basename(files)
    date_strs = os.path.splitext(filename)[0].split('_')[7:8]
    for date_str in date_strs:
        dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())

3.3 Frontal Region Convex-Hull Computation¶

This code block processes geographical coordinates to define the boundary of an oceanographic frontal zone, which will be visualized on maps to provide context for the MVP and SWOT measurements.

Ocean fronts are boundaries between different water masses, often characterized by strong horizontal gradients in properties like temperature, salinity, and density. These fronts can significantly influence ocean circulation and biological productivity, making them important features to identify in our analysis.

The process follows these steps:

  1. Load frontal zone data: A CSV file containing coordinates of points identified as belonging to the frontal region is loaded using pandas:

    df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
    
  2. Filter for the specific region: Only points labeled as region 'F' (front) are selected:

    region_f_df = df_f[df_f['Region'] == 'F']
    
  3. Extract coordinate pairs: The longitude and latitude values are extracted as a NumPy array, with any rows containing NaN values removed:

    coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
    
  4. Compute the convex hull: If there are sufficient points (more than 2), a convex hull is calculated using SciPy's ConvexHull function:

    if len(coords_f) > 2:
        hull_f = ConvexHull(coords_f)
        polygon_points_f = coords_f[hull_f.vertices]
    else:
        polygon_points_f = coords_f
    

    The convex hull is the smallest convex polygon that encloses all the points, providing a simplified boundary of the frontal zone.

The resulting polygon_points_f array contains the ordered vertices of the polygon that defines the frontal boundary. This polygon will be plotted on maps in subsequent visualizations, helping to contextualize the MVP measurements and SWOT observations relative to this important oceanographic feature.

By identifying and visualizing the frontal zone, we can examine how well SWOT captures the dynamic height gradients and associated geostrophic velocities across this oceanographically significant feature.

In [ ]:
## 3.3 Frontal region convex-hull computation
df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
region_f_df = df_f[df_f['Region'] == 'F']
coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
if len(coords_f) > 2:
    hull_f = ConvexHull(coords_f)
    polygon_points_f = coords_f[hull_f.vertices]
else:
    polygon_points_f = coords_f

3.3.1 Frontal Hull Plotting¶

The plot_fronte function visualizes the previously computed frontal zone boundary on map figures, highlighting this important oceanographic feature for contextual analysis.

plot_fronte(m)

Parameter:

  • m: A Basemap object on which to draw the frontal boundary

The function iterates through the vertices of the frontal polygon and connects consecutive points with line segments, creating a closed polygon that represents the boundary of the front. Several design choices enhance the visual representation:

  1. Adaptive line style: The function intelligently selects the line style based on the orientation of each segment:

    style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
    
    • Dashed lines (--) are used for predominantly east-west segments
    • Dotted lines (:) are used for predominantly north-south segments

    This helps visually distinguish the orientation of different parts of the front.

  2. Consistent formatting: All segments are drawn in purple with a linewidth of 2, making the front easily distinguishable from other map elements.

  3. High z-order (15): The front is drawn above most other map elements to ensure visibility.

  4. Legend entry: Only the first segment is labeled in the legend as "Front" to avoid multiple duplicate entries.

The function includes a safety check to ensure there are at least two points available for drawing; otherwise, it returns without action.

In oceanographic analysis, fronts are critical features that often represent boundaries between different water masses with distinct properties. They can drive vertical circulation, enhance biological productivity, and influence the propagation of waves and eddies. By clearly marking the frontal boundary on our maps, we can better interpret the patterns observed in both the MVP and SWOT data, particularly examining how well satellite altimetry captures the sea surface signatures of these dynamic features.

In [ ]:
### 3.3.1 Frontal hull plotting
def plot_fronte(m):
    """
    Draw the convex hull of the front in violet.
    """
    if len(polygon_points_f) < 2:
        return
    for i in range(len(polygon_points_f)):
        j = (i + 1) % len(polygon_points_f)
        lon1, lat1 = polygon_points_f[i]
        lon2, lat2 = polygon_points_f[j]
        x1, y1 = m(lon1, lat1)
        x2, y2 = m(lon2, lat2)
        style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
        if i == 0:
            m.plot([x1, x2], [y1, y2], color='purple',
                   linestyle=style, linewidth=2,
                   zorder=15, label='Front')
        else:
            m.plot([x1, x2], [y1, y2], color='purple',
                   linestyle=style, linewidth=2, zorder=15)

4. Transect-Based Velocity Diagnostics¶

When comparing velocity measurements from different sources along a ship track, we need specialized functions to calculate components in specific directions. This section introduces functions that compute signed velocity components relative to MVP transects.

4.1 Signed Component Perpendicular to Transect¶

The compute_signed_speed_along_transect function calculates the component of velocity perpendicular to the local direction of a transect, with sign indicating direction.

compute_signed_speed_along_transect(u, v, lon, lat)

Parameters:

  • u, v: Zonal and meridional velocity components
  • lon, lat: Longitude and latitude coordinates defining the transect

This function is essential for comparing satellite and in-situ velocities since MVP naturally measures the velocity component perpendicular to the ship's track. For a meaningful comparison, we need to project SWOT velocities in the same direction.

For each point along the transect, the algorithm follows these steps:

  1. Determine local transect direction: For each point i, calculate the vector (dx, dy) representing the local direction of the transect:

    • For the first point (i=0): Use the vector to the next point
    • For the last point (i=N-1): Use the vector from the previous point
    • For internal points: Use the centered difference between adjacent points
  2. Compute normal vector: The normal vector is perpendicular to the transect direction:

    nx, ny = dy, -dx  # 90-degree rotation of (dx, dy)
    
  3. Normalize to unit vector:

    nx, ny = nx / norm_n, ny / norm_n  # where norm_n = √(nx² + ny²)
    
  4. Calculate projection: The scalar (dot) product gives the component along the normal:

    w_signed[i] = u[i]*nx + v[i]*ny
    

This results in a signed velocity that is:

  • Positive when the flow is to the right of the transect direction
  • Negative when the flow is to the left of the transect direction
  • Zero when the flow is parallel to the transect

The function includes safety checks for edge cases:

  • Single-point transects (N_val = 1) return zero
  • Near-zero normal vectors (when the transect doubles back) return zero

This calculation enables direct comparison between MVP-measured velocities and satellite-derived velocities, despite their different native reference frames and measurement principles.

In [ ]:
## 4.1 Signed component perpendicular to transect
def compute_signed_speed_along_transect(u, v, lon, lat):
    """
    Compute the signed scalar component along the direction perpendicular
    to the transect defined by (lon, lat).
    """
    N_val = len(u)
    w_signed = np.zeros(N_val)
    for i in range(N_val):
        if N_val == 1:
            w_signed[i] = 0.0
            continue
        if i == 0:
            dx = lon[i+1] - lon[i]
            dy = lat[i+1] - lat[i]
        elif i == N_val-1:
            dx = lon[i] - lon[i-1]
            dy = lat[i] - lat[i-1]
        else:
            dx = lon[i+1] - lon[i-1]
            dy = lat[i+1] - lat[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        if norm_n < 1e-10:
            w_signed[i] = 0.0
            continue
        nx, ny = nx / norm_n, ny / norm_n
        w_signed[i] = (u[i]*nx + v[i]*ny)
    return w_signed

4.2 Signed Full-Speed Magnitude¶

In oceanographic data analysis, understanding the direction of velocity vectors is crucial for interpreting ocean currents. The compute_signed_speed_tot() function calculates the total speed of a velocity vector while preserving its directional information relative to a transect line.

The function takes four key inputs: zonal velocity component u, meridional velocity component v, longitude coordinates lon, and latitude coordinates lat.

The core algorithm works through several computational steps:

  1. Initialize an output array w_signed_tot filled with zeros, handling special cases like single-point input.

  2. Compute local transect direction vectors by calculating coordinate differences between adjacent points. This involves determining the local geometry of the transect line.

  3. Derive a normal vector perpendicular to the transect and normalize it.

Mathematically, the vector sign is determined by the dot product projection:

$$\text{sign} = \begin{cases} 1 & \text{if } \vec{v} \cdot \vec{n} \geq 0 \\ -1 & \text{if } \vec{v} \cdot \vec{n} < 0 \end{cases}$$

Where:

  • $\vec{v} = (u, v)$ is the velocity vector
  • $\vec{n} = (nx, ny)$ is the normalized normal vector
  • $\vec{v} \cdot \vec{n}$ represents the scalar projection

The final signed speed is computed as: $\text{signed_speed} = \text{sign} \times |\vec{v}|$

This approach ensures that the velocity vector's magnitude and direction are preserved relative to the local transect geometry, providing a nuanced representation of ocean current dynamics.

In [ ]:
## 4.2 Signed full-speed magnitude
def compute_signed_speed_tot(u, v, lon, lat):
    """
    Return signed total speed: |(u, v)| with sign determined by the
    projection onto (nx, ny).
    """
    N_val = len(u)
    w_signed_tot = np.zeros(N_val)
    for i in range(N_val):
        if N_val == 1:
            w_signed_tot[i] = 0.0
            continue
        if i == 0:
            dx = lon[i+1] - lon[i]
            dy = lat[i+1] - lat[i]
        elif i == N_val-1:
            dx = lon[i] - lon[i-1]
            dy = lat[i] - lat[i-1]
        else:
            dx = lon[i+1] - lon[i-1]
            dy = lat[i+1] - lat[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        speed = np.sqrt(u[i]**2 + v[i]**2)
        if norm_n < 1e-10:
            w_signed_tot[i] = 0.0
            continue
        nx, ny = nx / norm_n, ny / norm_n
        sign_ = 1.0 if (u[i]*nx + v[i]*ny) >= 0 else -1.0
        w_signed_tot[i] = sign_ * speed
    return w_signed_tot

5. MVP TRANSECT PROCESSING LOOP¶

The MVP (Moving Vessel Profiler) Transect Processing Loop represents the core analytical workflow of this oceanographic study. This comprehensive section transforms raw in situ measurements into refined scientific insights by systematically processing each marine transect collected during the BioSWOT-Med campaign.

The processing loop implements a multi-stage data transformation strategy that:

  • Extracts high-resolution oceanographic measurements
  • Performs advanced geophysical calculations
  • Prepares data for comparison with satellite-derived observations
  • Generates visualizations and statistical analyses

Each iteration of the loop will:

  1. Load raw MVP data
  2. Compute dynamic height and its anomalies
  3. Interpolate measurements onto a standardized grid
  4. Calculate geostrophic velocities
  5. Match MVP data with corresponding SWOT satellite overpasses
  6. Perform comparative analyses

The methodological approach allows for a rigorous, systematic investigation of submesoscale ocean dynamics in the Northwestern Mediterranean, bridging in situ measurements with satellite altimetry through sophisticated computational techniques.

5.0 Preliminary Set-up¶

The preliminary setup initializes the processing loop for Multiple Vessel Profiler (MVP) transects in the BioSWOT-Med campaign. The code prepares to systematically analyze a range of profile lines (PLs) from 5 to 24.

Key elements of this setup:

  • np.arange(5, 25) creates a NumPy array of integers from 5 to 24 (note that arange excludes the upper bound)
  • PLs stores the list of profile line numbers to be processed
  • tqdm() is a progress bar utility that provides visual feedback during the iterative processing of multiple transects

The subsequent for loop will iterate through each profile line, applying a comprehensive data processing workflow for each transect. This approach allows for:

  • Systematic data handling
  • Consistent analysis across multiple sampling locations
  • Tracking of processing progress through the tqdm progress bar

The use of an indexed loop ensures that each transect (PL) is processed individually, enabling detailed examination of local oceanographic characteristics in the Northwestern Mediterranean region.

In [ ]:
## 5.0 Preliminary set-up
PLs = np.arange(5, 25)

for PL in tqdm(PLs):

5.1 LOAD MVP DATA AND COMPUTE DYNAMIC HEIGHT¶

5.1.1 Read .mat metadata and convert dates¶

This code segment handles the crucial task of extracting and converting date information from MATLAB (.mat) files for each MVP transect. The process involves several key transformations:

  1. File Loading:

    • Uses scipy.io.loadmat() to read MATLAB metadata files
    • Dynamically loads files for each Profile Line (PL) using an f-string for filename generation
  2. Date Extraction:

    • extract_values() function recursively flattens nested date arrays
    • Converts extracted values into a pandas DataFrame for easier manipulation
  3. Date Conversion: The date conversion is a complex process that transforms MATLAB's serial date representation to Python datetime:

    • Uses datetime.fromordinal() to convert the integer part of the date
    • Adds fractional days using timedelta()
    • Subtracts 366 days to correct for MATLAB's date reference point

    Mathematical representation of the date conversion: $\text{Python Date} = \text{Ordinal Date} + \text{Fractional Days} - 366$

This meticulous date processing ensures accurate temporal mapping of MVP transect measurements, which is critical for aligning in situ data with satellite observations in the BioSWOT-Med campaign.

In [ ]:
## 5.1.1 Read .mat metadata and convert dates
    mat_file = sio.loadmat(
        '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
        f'MVP/L2/PL{PL}.mat'
    )
    date_extract = extract_values(mat_file['metadata']['date'])
    date_pd  = pd.DataFrame(date_extract)
    datejj   = date_pd[0]
    dates_mvp = [
        datetime.fromordinal(int(d[0][0])) +
        timedelta(days=d[0][0] % 1) -
        timedelta(days=366)
        for d in datejj
    ]

5.1.2 Oceanographic variables and centred dynamic height anomaly¶

This code segment performs critical oceanographic calculations to derive dynamic height, transforming raw measurements into scientifically meaningful data:

  1. Data Loading:

    • Loads MATLAB file for specific Profile Line (PL)
    • Extracts longitude, latitude, pressure, temperature, and salinity data
  2. Thermodynamic Conversions:

    • Uses Gibbs Seawater (GSW) toolbox for precise oceanographic calculations
    • Converts Practical Salinity (SP) to Absolute Salinity (SA)
    • Calculates Conservative Temperature (CT)
  3. Dynamic Height Computation:

    • Uses geo_strf_dyn_height() to calculate geostrophic stream function
    • Divides by gravitational acceleration ($g = 9.81$ m/s²)
    • Handles NaN values to ensure data integrity
    • Computes reference level as maximum pressure for each profile
  4. Anomaly Calculation:

    • Calculates dynamic height anomaly by subtracting the mean
    • Creates a pandas DataFrame to store processed data

Key mathematical transformations include:

  • Absolute Salinity: $SA = f(\text{Practical Salinity}, \text{Pressure}, \text{Longitude}, \text{Latitude})$
  • Conservative Temperature: $CT = f(SA, \text{Temperature}, \text{Pressure})$
  • Dynamic Height Anomaly: $\text{dhc} = \text{dh} - \langle\text{dh}\rangle$

This processing converts raw measurements into a standardized format for further oceanographic analysis.

In [ ]:
## 5.1.2 Oceanographic variables and centred dynamic height anomaly
    mat = sio.loadmat(
        '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
        f'MVP/L2_python/PL{PL}_python.mat'
    )
    lon_mvp, lat_mvp = mat['lon'], mat['lat']
    p_mvp, t_mvp     = mat['pres'], mat['temp']
    SP_mvp           = mat['sal']

    SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
    CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
    g_ = 9.81  # gravitational acceleration (m s⁻²)

    dh = []
    for i in range(SA.shape[0]):
        dh_i = (
            gsw.geo_strf_dyn_height(
                SA[i][~np.isnan(SA[i])],
                CT[i][~np.isnan(SA[i])],
                p_mvp[i][~np.isnan(SA[i])],
                p_ref=np.max(p_mvp[i][~np.isnan(SA[i])])
            )[0] / g_
        )
        dh.append(dh_i)
    dhmean = np.mean(dh)
    dhc = dh - dhmean  # centred anomaly

    df_mvp_raw = pd.DataFrame({
        'dh' : dh,
        'dhc': dhc,
        'lon': lon_mvp[0],
        'lat': lat_mvp[0]
    })

5.2 DISTANCE METRIC AND REGULAR GRIDDING¶

5.2.1 Cumulative distance along transect¶

This code segment computes the cumulative distance along the MVP transect using geodesic distance calculations. The process involves:

  1. Distance Initialization:

    • Creates a zero-initialized array to store cumulative distances
    • Ensures the first point starts at zero distance
  2. Geodesic Distance Computation:

    • Uses geopy.distance.distance() to calculate great-circle distances
    • Computes distance between consecutive latitude-longitude points
    • Accumulates distances cumulatively along the transect

Key characteristics of the distance calculation:

  • Accounts for the Earth's curvature
  • Provides accurate kilometer-based distance measurements
  • Handles varying geographical trajectories

Mathematically, the cumulative distance is computed as:

$$\text{dist}[i] = \text{dist}[i-1] + \text{geodesic\_distance}(\text{point}_{i-1}, \text{point}_i)$$

The resulting dist column in the DataFrame represents the progressive distance along the ship's track, critical for subsequent spatial interpolation and analysis of oceanographic data.

In [ ]:
## 5.2.1 Cumulative distance along transect
    dist_along_mvp = np.zeros(len(df_mvp_raw))
    for i in range(1, len(df_mvp_raw)):
        dist_along_mvp[i] = (
            dist_along_mvp[i-1] +
            geopy.distance.distance(
                (df_mvp_raw['lat'].values[i-1],
                 df_mvp_raw['lon'].values[i-1]),
                (df_mvp_raw['lat'].values[i],
                 df_mvp_raw['lon'].values[i])
            ).km
        )
    df_mvp_raw['dist'] = dist_along_mvp

5.2.2 Interpolation to 1-km grid and 11-km smoothing¶

This code segment performs two critical data preprocessing steps: spatial interpolation and smoothing of oceanographic measurements.

  1. Interpolation Parameters:

    • Sets a uniform 1 km spatial sampling (step_km)
    • Defines an 11 km smoothing window (window_size_km)
    • Creates a regularly spaced distance grid from 0 to maximum transect distance
  2. Linear Interpolation:

    • Uses NumPy's interp() to map original measurements onto regular grid
    • Interpolates longitude, latitude, dynamic height, and dynamic height anomaly
    • Ensures consistent spatial resolution across different variables
  3. Smoothing Technique:

    • Applies a centered rolling window mean using pandas rolling() method
    • Enforces an odd window size to maintain symmetry
    • Reduces high-frequency noise while preserving mesoscale features
    • Computes smoothed values for raw dynamic height and its anomaly

Mathematical representations:

  • Interpolation: $\text{var}_{\text{reg}} = \text{interp}(\text{dist}_{\text{reg}}, \text{dist}_{\text{raw}}, \text{var}_{\text{raw}})$
  • Smoothing: $\text{var}_{\text{smooth}}[i] = \frac{1}{N} \sum_{j=-N/2}^{N/2} \text{var}_{\text{raw}}[i+j]$

The resulting DataFrame provides a consistently gridded and smoothed dataset, prepared for further oceanographic analysis.

In [ ]:
## 5.2.2 Interpolation to 1-km grid and 11-km smoothing
    step_km = 1
    window_size_km = 11
    dist_max = df_mvp_raw['dist'].iloc[-1]
    dist_reg = np.arange(0, dist_max + step_km, step_km)
    lon_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
    lat_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
    dh_reg   = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
    dhc_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])

    window_size = int(window_size_km / step_km)
    if window_size % 2 == 0:
        window_size += 1  # enforce odd window length

    dh_reg_smooth = (
        pd.Series(dh_reg)
        .rolling(window=window_size, center=True)
        .mean()
        .values
    )
    dhc_reg_smooth = (
        pd.Series(dhc_reg)
        .rolling(window=window_size, center=True)
        .mean()
        .values
    )

    df_mvp = pd.DataFrame({
        'dist'    : dist_reg,
        'lon'     : lon_reg,
        'lat'     : lat_reg,
        'dh_raw'  : dh_reg,
        'dhc_raw' : dhc_reg,
        'dh'      : dh_reg_smooth,
        'dhc'     : dhc_reg_smooth
    }).dropna(subset=['dh', 'dhc']).copy()

5.3 GEOSTROPHIC VELOCITY FROM MVP¶

This code segment calculates geostrophic velocity vectors from the Moving Vessel Profiler (MVP) dynamic height measurements, transforming vertical ocean structure data into horizontal current fields.

The geostrophic velocity calculation involves several key steps:

  1. Initial Velocity Computation:

    • Uses Gibbs Seawater (GSW) toolbox's geostrophic_velocity() function
    • Inputs smoothed dynamic height, longitude, and latitude values
    • Generates initial geostrophic velocity components
  2. Velocity Vector Transformation:

    • Computes local coordinate system angles using arctan2()
    • Converts geostrophic stream function to zonal (u) and meridional (v) velocity components
    • Applies gravitational acceleration scaling

Key mathematical transformations include:

  • Local angle calculation: $\theta = \arctan2(dx, dy)$
  • Velocity components: $u = -w \cdot g \cdot \cos(\theta)$ $v = w \cdot g \cdot \sin(\theta)$

Where:

  • $w$ is the geostrophic stream function
  • $g$ is gravitational acceleration
  • $\theta$ represents the local coordinate system angle

The method handles edge cases by using different derivative approximations for first, last, and intermediate points, ensuring a robust velocity field representation across the entire transect.

The result is a DataFrame enriched with zonal and meridional velocity components, ready for further oceanographic analysis.

In [ ]:
# 5.3 GEOSTROPHIC VELOCITY FROM MVP
    w_mvp_smooth = gsw.geostrophy.geostrophic_velocity(
        df_mvp['dh'].values,
        df_mvp['lon'].values,
        df_mvp['lat'].values
    )
    N_mvp = len(w_mvp_smooth[0])
    df_mvp = df_mvp.iloc[:N_mvp].copy()

    u_mvp = np.zeros(N_mvp)
    v_mvp = np.zeros(N_mvp)
    for i in range(N_mvp):
        if N_mvp > 1:
            if i == 0:
                dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i]
                dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i]
            elif i == N_mvp-1:
                dx = w_mvp_smooth[1][i] - w_mvp_smooth[1][i-1]
                dy = w_mvp_smooth[2][i] - w_mvp_smooth[2][i-1]
            else:
                dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i-1]
                dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i-1]
            angle = np.arctan2(dx, dy)
            u_mvp[i] = - w_mvp_smooth[0][i] * g_ * np.cos(angle)
            v_mvp[i] =   w_mvp_smooth[0][i] * g_ * np.sin(angle)

    df_mvp['u_mvp'] = u_mvp
    df_mvp['v_mvp'] = v_mvp

5.4 MATCH MVP DATES TO SWOT¶

5.4.1 Identify corresponding SWOT file¶

This code segment performs a critical task of matching MVP transect dates with corresponding SWOT satellite observations, ensuring precise temporal alignment for subsequent analysis.

  1. Date Matching:

    • Searches through predefined SWOT dates
    • Finds the first matching date with the MVP transect
    • Converts date to string format for file search
  2. Domain Definition:

    • Sets a specific geographical domain for data extraction
    • Coordinates: Longitude [3°, 6.5°], Latitude [38.5°, 44°]
    • Focuses on the Northwestern Mediterranean region
  3. SWOT File Location:

    • Uses glob to search for SWOT NetCDF files
    • Matches files containing the specific date
    • Handles cases where no matching file is found
  4. Data Extraction:

    • Loads SWOT data using custom loadsshuv() function
    • Extracts key variables:
      • Absolute Dynamic Topography (ADT)
      • Longitude and Latitude
      • Zonal and Meridional velocities
    • Flattens multidimensional arrays for easier processing
  5. ADT Anomaly Computation:

    • Calculates mean ADT
    • Computes centered ADT anomaly by subtracting the mean

The method ensures precise spatial and temporal synchronization between in situ MVP measurements and satellite observations, a crucial step in cross-validation of oceanographic data.

In [ ]:
## 5.4.1 Identify corresponding SWOT file
    dayv = None
    for dsw in dates_swot:
        if dsw == dates_mvp[0].date():
            dayv = dsw.strftime('%Y-%m-%d')
            break
    if not dayv:
        print(f"No matching (SWOT) date for PL{PL}")
        continue

    domain = [3, 38.5, 6.5, 44]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn_swot = {
        'longitude': 'longitude',
        'latitude' : 'latitude',
        'u'        : 'ugos',
        'v'        : 'vgos',
        'adt'      : 'adt'
    }
    fname = glob.glob(
        '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
        f'SWOT_L3_LR_SSH_Expert_*{dayv_dt.strftime("%Y%m%d")}*_*.nc'
    )
    if not fname:
        print(f"No SWOT file found for date: {dayv} - PL{PL}")
        continue

    swot = loadsshuv(fname[0], '', varn_swot, type='swot', area=domain)
    adt_flat_swot = swot['adt'].flatten()
    lon_flat_swot = swot['lon'].flatten()
    lat_flat_swot = swot['lat'].flatten()
    u_swot_tot    = swot['u'].flatten()
    v_swot_tot    = swot['v'].flatten()

    adtmean_swot = np.nanmean(adt_flat_swot)
    adtc_swot = adt_flat_swot - adtmean_swot

5.4.2 Nearest-neighbour search to MVP transect¶

This code segment performs a spatial interpolation technique to map SWOT satellite data onto the MVP transect trajectory using a computationally efficient nearest-neighbor search.

  1. Data Filtering:

    • Creates a valid data mask to remove NaN values
    • Filters longitude, latitude, velocities, and ADT
    • Ensures data integrity for spatial matching
  2. Spatial Nearest-Neighbor Search:

    • Uses cKDTree from SciPy for efficient spatial querying
    • Constructs a k-dimensional tree from valid SWOT coordinates
    • Allows rapid identification of closest satellite data points
  3. Coordinate Matching:

    • Queries the k-d tree with MVP transect coordinates
    • Finds the single nearest point for each MVP measurement
    • Returns indices of the closest SWOT data points
  4. Data Extraction:

    • Retrieves matched SWOT velocity components
    • Extracts corresponding Absolute Dynamic Topography (ADT)

The technique ensures precise spatial alignment between in situ and satellite-derived measurements, critical for robust oceanographic cross-validation. By using nearest-neighbor interpolation, the method maintains the original resolution of both datasets while providing a direct point-to-point comparison.

In [ ]:
# 5.4.2 Nearest-neighbour search to MVP transect
    valid_mask = ~np.isnan(lon_flat_swot) & ~np.isnan(lat_flat_swot)
    lon_valid = lon_flat_swot[valid_mask]
    lat_valid = lat_flat_swot[valid_mask]
    u_valid   = u_swot_tot[valid_mask]
    v_valid   = v_swot_tot[valid_mask]
    adt_valid = adt_flat_swot[valid_mask]

    tree = cKDTree(np.c_[lon_valid, lat_valid])
    _, indices = tree.query(
        np.c_[df_mvp['lon'].values, df_mvp['lat'].values], k=1
    )
    u_swot = u_valid[indices]
    v_swot = v_valid[indices]
    adt_on_transect = adt_valid[indices]

5.5 DIAGNOSTIC SCALING AND SWOT-FROM-ADT COMPUTATION¶

This code segment performs a scaling transformation on the velocity components, multiplying the MVP-derived velocities by a large coefficient (100,000).

The purpose of scaling is to:

  • Enhance visual representation of velocity vectors
  • Improve numerical comparability between different velocity datasets
  • Facilitate plotting and graphical analysis

By multiplying velocity components by coeff_scaled:

  • Small velocity magnitudes become more distinguishable
  • Facilitates direct visualization of current patterns
  • Allows easier comparison across different measurement platforms

Mathematically, the transformation is simple:

  • $u_{\text{scaled}} = u_{\text{original}} \times 100,000$
  • $v_{\text{scaled}} = v_{\text{original}} \times 100,000$

This scaling approach is common in oceanographic visualization, helping researchers better perceive subtle ocean current dynamics that might be imperceptible at their original, typically small magnitudes.

In [ ]:
# 5.5 SCALING
    coeff_scaled = 100000
    u_mvp_scaled = df_mvp['u_mvp'].values * coeff_scaled
    v_mvp_scaled = df_mvp['v_mvp'].values * coeff_scaled

5.5.1 Geostrophic velocity from interpolated SWOT ADT¶

This code segment computes geostrophic velocities by interpolating and processing Absolute Dynamic Topography (ADT) from SWOT satellite data.

  1. Spatial Interpolation:

    • Uses griddata() for linear interpolation of ADT
    • Maps SWOT ADT values onto MVP transect coordinates
    • Ensures spatial alignment between datasets
  2. Geostrophic Velocity Computation:

    • Applies Gibbs Seawater geostrophic velocity function
    • Converts ADT gradients to velocity components
    • Handles edge cases and single-point scenarios
  3. Velocity Vector Transformation:

    • Calculates local coordinate system angles
    • Converts geostrophic stream function to zonal (u) and meridional (v) velocity components
    • Applies gravitational acceleration scaling

Key mathematical transformations include:

  • Angle calculation: $\theta = \arctan2(dx, dy)$
  • Velocity components: $u = -w \cdot g \cdot \cos(\theta)$ $v = w \cdot g \cdot \sin(\theta)$

Where:

  • $w$ is the geostrophic stream function
  • $g$ is gravitational acceleration
  • $\theta$ represents the local coordinate system angle

The final step scales velocity components by 100,000 to enhance visualization and numerical comparability, similar to the MVP velocity scaling performed earlier.

In [ ]:
## 5.5.1 Geostrophic velocity from interpolated SWOT ADT
    adt_swot_interp = griddata(
        (lon_valid, lat_valid),
        adt_valid,
        (df_mvp['lon'].values, df_mvp['lat'].values),
        method='linear'
    )
    w_swot_calc_raw = gsw.geostrophy.geostrophic_velocity(
        adt_swot_interp,
        df_mvp['lon'].values,
        df_mvp['lat'].values
    )
    N2 = min(len(w_swot_calc_raw[0]),
             len(w_swot_calc_raw[1]),
             len(w_swot_calc_raw[2]))
    u_swot_calc = np.zeros(N2)
    v_swot_calc = np.zeros(N2)
    for i in range(N2):
        if N2 == 1:
            dx2 = dy2 = 0
        elif i == 0:
            dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i]
            dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i]
        elif i == N2-1:
            dx2 = w_swot_calc_raw[1][i] - w_swot_calc_raw[1][i-1]
            dy2 = w_swot_calc_raw[2][i] - w_swot_calc_raw[2][i-1]
        else:
            dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i-1]
            dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i-1]
        angle2 = np.arctan2(dx2, dy2)
        u_swot_calc[i] = - w_swot_calc_raw[0][i] * g_ * np.cos(angle2)
        v_swot_calc[i] =   w_swot_calc_raw[0][i] * g_ * np.sin(angle2)

    scale_calc = 100000
    u_swot_calc_scaled = u_swot_calc * scale_calc
    v_swot_calc_scaled = v_swot_calc * scale_calc

6. Visualization and Comparative Analysis¶

This section generates comprehensive visualizations to compare SWOT satellite and MVP in-situ measurements, allowing for qualitative assessment of their agreement and highlighting oceanographic features.

6.1 Map-Based Diagnostics¶

6.1.1 ADT-Based Velocity vs MVP¶

This visualization creates a map comparing geostrophic velocities derived from SWOT Absolute Dynamic Topography (ADT) against those calculated from MVP measurements. This comparison is fundamental to validating the satellite data's ability to capture ocean currents.

The plot construction involves several key steps:

  1. Map initialization with Basemap:

    • The projection is set to Mercator (projection='merc')
    • The spatial extent is determined by the MVP transect coordinates with a 0.4° buffer
    • The resolution is set to intermediate (resolution='i') for an appropriate level of coastline detail
    • Standard map elements (coastlines, country borders, graticules) are added for geographic context
  2. Background sea surface height field:

    • SWOT ADT values are plotted as a color-coded scatter plot
    • The Spectral_r colormap represents ADT anomalies from -10 cm to +10 cm
    • This background field visualizes the mesoscale structures (eddies, fronts) that drive geostrophic currents
  3. Velocity vector overlay:

    • Blue vectors show SWOT-derived velocities calculated from ADT gradients
    • Red vectors show MVP-derived velocities
    • Both are plotted at the same MVP sampling locations for direct comparison
    • Vectors are scaled to maintain visual clarity (scaling factor of 100,000)
  4. Contextual elements:

    • The frontal zone is highlighted in purple using the plot_fronte function
    • A reference velocity scale (quiverkey) of 0.10 m/s is provided
    • The colorbar, title, and legend complete the figure information

This visualization allows for immediate qualitative assessment of:

  • Spatial patterns in sea surface height (eddies, fronts)
  • Agreement between satellite and in-situ velocity directions and magnitudes
  • Relationship between velocity fields and oceanographic features

The map provides crucial insight into how well SWOT captures the geostrophic flow field as compared to direct in-situ measurements, particularly around the frontal zone where strong horizontal gradients are expected.

In [ ]:
### 6.1.1 ADT-based velocity vs MVP
    min_len_calc = min(N2, len(df_mvp))
    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_swot_map, y_swot_map = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot_map, y_swot_map,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )

    x_pl_map_1, y_pl_map_1 = m(
        df_mvp['lon'].values[:min_len_calc],
        df_mvp['lat'].values[:min_len_calc]
    )
    m.scatter(x_pl_map_1, y_pl_map_1, color='black', label=f'PL{PL}', zorder=5, s=10)

    q1 = m.quiver(
        x_pl_map_1, y_pl_map_1,
        u_swot_calc_scaled[:min_len_calc],
        v_swot_calc_scaled[:min_len_calc],
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_adt, zorder=5,
        label='SWOT velocity (from ADT)'
    )
    m.quiver(
        x_pl_map_1, y_pl_map_1,
        u_mvp_scaled[:min_len_calc],
        v_mvp_scaled[:min_len_calc],
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )

    plt.quiverkey(q1, 0.80, 0.1, 0.10 * scale_calc, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m) [unsmoothed]')
    plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (1) ADT-based Velocity vs MVP')
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

6.1.2 Full SWOT Velocity vs MVP¶

This visualization compares the full SWOT velocity field (provided directly in the SWOT data product) with the MVP-derived velocities. Unlike the previous figure that used velocities calculated from ADT gradients, this plot utilizes the pre-computed geostrophic velocity components (ugos, vgos) from the SWOT database.

The map structure follows the same general layout as the previous figure, with the following key differences:

  1. Vector data source:

    • Green vectors represent the total SWOT velocities from the database
    • These velocities may incorporate additional processing steps beyond simple ADT gradients
    • They represent the official SWOT geostrophic velocity product
  2. Vector scaling:

    u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
    v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled
    

    The scaling coefficient (coeff_scaled, set to 100,000) ensures that the vectors are visually interpretable despite the small magnitude of ocean currents (typically centimeters to tens of centimeters per second).

  3. Coordinate system: Unlike MVP velocities which are naturally perpendicular to the ship track, these SWOT velocities are provided in standard zonal and meridional components. This means they represent the full 2D velocity field rather than just the cross-track component.

This comparison is important because it reveals differences between two approaches to deriving velocity from SWOT data:

  • The ADT-gradient approach (previous figure) which uses the geostrophic equations applied to SWOT SSH
  • The official SWOT velocity product which may include additional processing or corrections

By comparing both SWOT velocity types against MVP measurements, we can determine which better represents the actual ocean currents. Potential differences might arise from:

  1. Different filtering or smoothing applied to the SSH field before computing gradients
  2. Corrections for known biases or artifacts in the SWOT measurements
  3. Adjustments to account for the mean dynamic topography

The color-coded background continues to show the ADT anomaly field, providing context for the velocity vectors and highlighting the relationship between sea surface height patterns and the associated geostrophic flow.

In [ ]:
### 6.1.2 Full SWOT velocity vs MVP
    min_len_uv = min(len(u_swot), len(df_mvp))
    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_pl, y_pl = m(
        df_mvp['lon'].values[:min_len_uv],
        df_mvp['lat'].values[:min_len_uv]
    )
    x_swot, y_swot = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot, y_swot,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )
    m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)

    u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
    v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled

    q2 = m.quiver(
        x_pl, y_pl,
        u_swot_scaled_tot, v_swot_scaled_tot,
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_total, zorder=5,
        label='SWOT velocity (TOTAL)'
    )
    m.quiver(
        x_pl, y_pl,
        u_mvp_scaled[:min_len_uv], v_mvp_scaled[:min_len_uv],
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )

    plt.quiverkey(q2, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m)')
    plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (2) FULL Velocity')
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

6.1.3 Projected SWOT Velocity vs MVP¶

This visualization addresses a fundamental challenge in comparing SWOT and MVP velocity measurements: different reference frames. While SWOT provides velocities in standard zonal/meridional components, MVP naturally measures flow perpendicular to the ship track. This figure projects the SWOT velocities onto the direction perpendicular to the MVP transect for direct comparison.

The projection process involves two main steps:

  1. Calculate scalar projection: The function compute_signed_speed_along_transect determines the signed magnitude of SWOT velocity perpendicular to the transect:

    w_swot_proj_sign_map = compute_signed_speed_along_transect(
       u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    

    This yields the speed component in the direction normal to the transect, with sign indicating direction.

  2. Convert to vector representation: The scalar projections are transformed back into vector components, but now aligned with the local normal direction:

    nx, ny = dy, -dx  # Normal vector to transect (perpendicular)
    nx, ny = nx / norm_n, ny / norm_n  # Normalized
    u_swot_proj_map[i] = scalar_proj * nx
    v_swot_proj_map[i] = scalar_proj * ny
    

This approach ensures that:

  • SWOT velocity vectors (dark green) are oriented perpendicular to the transect
  • Their magnitude represents only the cross-track component
  • They can be directly compared with MVP velocities (red)

The figure uses the same map layout and ADT background as previous visualizations, but now with the projected SWOT velocities represented by dark green arrows. This projection method offers the most direct comparison between satellite and in-situ measurements by aligning their reference frames.

Key oceanographic insights from this figure include:

  • How well SWOT captures cross-frontal flows
  • Whether the satellite detects the same small-scale features as the in-situ measurements
  • Areas where SWOT might overestimate or underestimate velocity magnitude

The visualization provides the most rigorous comparison of SWOT and MVP velocities by ensuring they represent the same physical quantity: the geostrophic flow perpendicular to the transect direction.

In [ ]:
### 6.1.3 Projected SWOT velocity vs MVP
    w_swot_proj_sign_map = compute_signed_speed_along_transect(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    u_swot_proj_map = np.zeros(len(df_mvp))
    v_swot_proj_map = np.zeros(len(df_mvp))
    for i in range(len(df_mvp)):
        if len(df_mvp) == 1:
            dx = dy = 0
        elif i == 0:
            dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i]
            dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i]
        elif i == len(df_mvp) - 1:
            dx = df_mvp['lon'].values[i] - df_mvp['lon'].values[i-1]
            dy = df_mvp['lat'].values[i] - df_mvp['lat'].values[i-1]
        else:
            dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i-1]
            dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        if norm_n < 1e-10:
            u_swot_proj_map[i] = 0.
            v_swot_proj_map[i] = 0.
        else:
            nx, ny = nx / norm_n, ny / norm_n
            scalar_proj = w_swot_proj_sign_map[i]
            u_swot_proj_map[i] = scalar_proj * nx
            v_swot_proj_map[i] = scalar_proj * ny

    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_swot, y_swot = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot, y_swot,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )
    x_pl, y_pl = m(df_mvp['lon'].values, df_mvp['lat'].values)
    m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)

    u_swot_proj_scaled_map = u_swot_proj_map * coeff_scaled
    v_swot_proj_scaled_map = v_swot_proj_map * coeff_scaled
    q3 = m.quiver(
        x_pl, y_pl,
        u_swot_proj_scaled_map, v_swot_proj_scaled_map,
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_proj, zorder=5,
        label='SWOT velocity (Projected)'
    )
    m.quiver(
        x_pl, y_pl,
        u_mvp_scaled, v_mvp_scaled,
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )
    plt.quiverkey(q3, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m)')
    plt.title(
        f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (3) PROJECTED Velocity (New Method)'
    )
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

6.2 One-dimensional Signed Velocity Comparison¶

6.2.1 Signed Velocity Profiles¶

While map-based visualizations provide spatial context, one-dimensional profile plots enable quantitative comparison of velocity magnitudes along the transect. This figure displays signed velocity components from multiple sources plotted against distance along the MVP transect.

The code first calculates signed velocity components from five different sources:

  1. MVP signed velocity (w_mvp_sign): The reference in-situ measurement, naturally perpendicular to the transect
  2. SWOT total signed velocity (w_swot_tot_sign): Full magnitude of SWOT velocity with sign based on cross-track direction
  3. SWOT projected signed velocity (w_swot_proj_sign): Only the component perpendicular to the transect
  4. SWOT ADT-based signed velocity (w_swot_calc_sign): Velocity calculated directly from SWOT ADT gradients
  5. DUACS signed velocity (w_duacs_sign_full): Velocity from the multi-mission merged altimetry product

For DUACS velocity calculation, the code:

  1. Loads DUACS ADT along the MVP transect using the load_DUACS function
  2. Computes geostrophic velocities using GSW's geostrophic_velocity function
  3. Converts streamfunction to vector components based on local transect angles
  4. Projects the velocities perpendicular to the transect

The negative signs in the calculations (e.g., w_swot_tot_sign = -compute_signed_speed_tot(...)) ensure consistent orientation of the velocities, with positive values representing flow in the same direction.

The resulting line plot shows:

  • Distance along transect (km) on the x-axis
  • Signed velocity (m/s) on the y-axis
  • Five differently colored and marked lines representing each data source

This visualization reveals:

  • Magnitude differences: Whether SWOT systematically overestimates or underestimates velocities
  • Pattern agreement: Whether velocity peaks and troughs occur at the same locations
  • Small-scale features: Which high-frequency variations are captured by different products
  • Comparative performance: How SWOT compares to the established DUACS product

Most importantly, this plot quantifies discrepancies between satellite and in-situ measurements in a way that the map visualizations cannot, enabling statistical analysis of the differences. Sharp transitions in the profiles often indicate crossing oceanographic fronts or the edges of eddies, with the agreement between profiles at these features being particularly important for validating SWOT's ability to resolve fine-scale ocean dynamics.

In [ ]:
### 6.2.1 Signed velocity profiles
    w_swot_tot_sign = - compute_signed_speed_tot(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_swot_proj_sign = - compute_signed_speed_along_transect(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_mvp_sign = - compute_signed_speed_along_transect(
        df_mvp['u_mvp'].values, df_mvp['v_mvp'].values,
        df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_swot_calc_sign = - compute_signed_speed_along_transect(
        u_swot_calc, v_swot_calc,
        df_mvp['lon'].values[:N2], df_mvp['lat'].values[:N2]
    )

    duacsdir = '/Volumes/Extreme SSD/MIO/DUACS/'
    savefig  = ''

    # DUACS reference velocity
    adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
    if adt_duacs is not None:
        w_duacs_raw = gsw.geostrophy.geostrophic_velocity(
            adt_duacs, df_mvp['lon'].values, df_mvp['lat'].values
        )
        N_duacs = len(w_duacs_raw[0])
        u_duacs = np.zeros(N_duacs)
        v_duacs = np.zeros(N_duacs)
        for i in range(N_duacs):
            if N_duacs == 1:
                dx_d = dy_d = 0
            elif i == 0:
                dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i]
                dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i]
            elif i == N_duacs - 1:
                dx_d = w_duacs_raw[1][i] - w_duacs_raw[1][i-1]
                dy_d = w_duacs_raw[2][i] - w_duacs_raw[2][i-1]
            else:
                dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i-1]
                dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i-1]
            angle_d = np.arctan2(dx_d, dy_d)
            u_duacs[i] = - w_duacs_raw[0][i] * g_ * np.cos(angle_d)
            v_duacs[i] =   w_duacs_raw[0][i] * g_ * np.sin(angle_d)
        w_duacs_sign_full = - compute_signed_speed_along_transect(
            u_duacs, v_duacs,
            df_mvp['lon'].values, df_mvp['lat'].values
        )
    else:
        w_duacs_sign_full = np.full(len(df_mvp), np.nan)

    min_len_4 = min(
        len(w_mvp_sign), len(w_swot_tot_sign), len(w_swot_proj_sign),
        len(w_swot_calc_sign), len(w_duacs_sign_full)
    )
    x_4              = df_mvp['dist'].values[:min_len_4]
    w_mvp_4          = w_mvp_sign[:min_len_4]
    w_swot_tot_4     = w_swot_tot_sign[:min_len_4]
    w_swot_proj_4    = w_swot_proj_sign[:min_len_4]
    w_swot_calc_4    = w_swot_calc_sign[:min_len_4]
    w_duacs_4        = w_duacs_sign_full[:min_len_4]

    duacs_color = "magenta"

    plt.figure()
    plt.plot(x_4, w_mvp_4,          'o-', color=mvp_color,        label='MVP (signed)')
    plt.plot(x_4, w_swot_tot_4,     's-', color=swot_color_total, label='SWOT TOTAL (signed)')
    plt.plot(x_4, w_swot_proj_4,    '^-', color=swot_color_proj,  label='SWOT Projected (signed)')
    plt.plot(x_4, w_swot_calc_4,    'd-', color=swot_color_adt,   label='SWOT from ADT (signed)')
    plt.plot(x_4, w_duacs_4,        'p-', color=duacs_color,      label='DUACS (signed)')
    plt.grid(True)
    plt.legend()
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Velocity (m/s) [signed]')
    plt.title(
        f'PL{PL} - (4) Signed Velocity Comparison '
        '(MVP, TOTAL, Projected, ADT-based, DUACS)'
    )
    plt.show()

6.3 Extra Diagnostic Plots¶

This section prepares additional datasets for detailed diagnostic visualizations that directly compare sea surface height measurements from different sources. These comparisons are essential for understanding the agreement between satellite altimetry products and in-situ observations.

The code first loads interpolated ADT data from both SWOT and DUACS at MVP sampling points:

adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)

For each dataset, two versions are prepared:

  1. Raw topography (adt_swot_extra, adt_duacs): The absolute dynamic topography values directly from the altimetry products
  2. Centered anomalies (adtc_swot_extra, adtc_duacs): Anomalies calculated by subtracting the mean value from each dataset:
    adtmean_swot_extra = np.nanmean(adt_swot_extra)
    adtc_swot_extra = adt_swot_extra - adtmean_swot_extra
    

This centering operation is important because:

  • Different products may have different reference levels
  • We're primarily interested in the spatial patterns rather than absolute values
  • It facilitates direct visual comparison by bringing all datasets to a common baseline

The final step ensures all arrays are trimmed to the same length by limiting them to the length of the MVP dataframe:

L_full = len(df_mvp)
adt_swot_extra = adt_swot_extra[:L_full]
adtc_swot_extra = adtc_swot_extra[:L_full]
adt_duacs = adt_duacs[:L_full]
adtc_duacs = adtc_duacs[:L_full]

These prepared datasets will be used in the subsequent visualization plots to directly compare:

  1. MVP dynamic height with satellite ADT measurements
  2. Raw topography vs. centered anomalies
  3. Differences between various data sources

Such comparisons are vital for validating satellite measurements against in-situ data and understanding the strengths and limitations of different altimetry products.

In [ ]:
## 6.3 Extra diagnostic plots
    swotdir = '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
    savefig = '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/figures'

    adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
    if adt_swot_extra is None:
        continue
    adtmean_swot_extra = np.nanmean(adt_swot_extra)
    adtc_swot_extra = adt_swot_extra - adtmean_swot_extra

    adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
    if adt_duacs is None:
        continue
    adtmean_duacs = np.nanmean(adt_duacs)
    adtc_duacs = adt_duacs - adtmean_duacs

    # Trim satellite arrays to MVP transect length
    L_full = len(df_mvp)
    adt_swot_extra  = adt_swot_extra[:L_full]
    adtc_swot_extra = adtc_swot_extra[:L_full]
    adt_duacs       = adt_duacs[:L_full]
    adtc_duacs      = adtc_duacs[:L_full]

6.3.1 Centered Anomalies and Raw Topography¶

This diagnostic plot provides a direct comparison between raw dynamic topography and centered anomalies from three different data sources (MVP, SWOT, and DUACS) along the MVP transect. This visualization is crucial for understanding how different reference levels affect the comparison between satellite and in-situ measurements.

The figure contains six line plots arranged in three pairs:

  1. MVP measurements:

    • Solid red line: Centered dynamic height anomaly (dhc)
    • Dashed red line: Raw dynamic height (dh)

    These represent the in-situ reference measurements calculated from MVP temperature and salinity profiles.

  2. SWOT measurements:

    • Solid blue line: Centered ADT anomaly (adtc_swot_extra)
    • Dashed blue line: Raw ADT (adt_swot_extra)

    These show the sea surface height measured by the SWOT satellite.

  3. DUACS measurements:

    • Solid green line: Centered ADT anomaly (adtc_duacs)
    • Dashed green line: Raw ADT (adt_duacs)

    These represent the merged multi-mission altimetry product, providing a well-established reference.

A horizontal gray line at y=0 provides a reference for the centered anomalies.

This visualization reveals several important aspects:

  1. Vertical offsets: The raw topography lines (dashed) show substantial vertical shifts between datasets, highlighting the different reference levels used by each source. These offsets are primarily due to:

    • Different mean dynamic topography models
    • Different integration reference depths in dynamic height calculations
    • Instrumental biases
  2. Pattern agreement: Despite the vertical offsets, the centered anomalies (solid lines) should show similar spatial patterns if all sources are capturing the same oceanographic features.

  3. Amplitude differences: Variations in the amplitude of oscillations between datasets may indicate differences in spatial resolution, smoothing, or the sensitivity to fine-scale features.

By examining both raw and centered versions together, we can distinguish between issues related to reference levels (affecting the absolute values) and those related to the actual measurement of oceanographic features (affecting the spatial patterns). This distinction is crucial for proper validation of SWOT measurements against in-situ observations.

In [ ]:
### 6.3.1 Centred anomalies and raw topography
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dhc'],
             label='MVP (centered adt)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], df_mvp['dh'],
             '--', label='MVP (raw topography)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adtc_swot_extra,
             label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adt_swot_extra,
             '--', label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adtc_duacs,
             label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    plt.plot(df_mvp['dist'], adt_duacs,
             '--', label='DUACS (raw topography)', color=swot_color_total, zorder=2)
    plt.plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
             '-', color='grey', zorder=1)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 1: Centered anomalies vs raw topography')
    plt.show()

6.3.2 Raw Topography Comparison¶

This figure isolates the raw (non-centered) dynamic topography measurements from the three different sources, allowing for direct comparison of absolute values along the transect. By focusing only on the raw measurements, we can better assess the absolute differences between satellite and in-situ observations.

The plot contains three lines:

  1. MVP raw dynamic height (red line):

    • Calculated from in-situ temperature and salinity profiles
    • Represents the vertical integration of density anomalies from a reference pressure level
    • Units are in meters of dynamic height
  2. SWOT raw ADT (blue line):

    • Absolute Dynamic Topography from SWOT measurements
    • Includes both the Mean Dynamic Topography (MDT) and Sea Level Anomaly (SLA)
    • Represents the total sea surface height relative to the geoid
  3. DUACS raw ADT (green line):

    • ADT from the merged multi-mission altimetry product
    • Provides a well-established reference for comparison

The vertical offsets between these measurements are particularly informative:

  • MVP vs. Satellite offsets: Differences between MVP and satellite measurements often result from different reference levels. The MVP dynamic height is calculated relative to a reference pressure level (typically the deepest measurement), while satellite ADT is referenced to the geoid.

  • SWOT vs. DUACS offsets: Differences between the two satellite products might indicate variations in the MDT models used, calibration approaches, or processing methodologies.

The shapes of the curves, independent of their vertical positions, reveal how well each dataset captures the spatial variations in dynamic topography. Similar patterns with consistent vertical offsets suggest good agreement in detecting oceanographic features, while divergent patterns would indicate more fundamental measurement differences.

This analysis of raw topography is essential for understanding the absolute calibration of different measurement systems and identifying systematic biases that might need correction in the satellite products.

In [ ]:
### 6.3.2 Raw topography comparison
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dh'],
             label='MVP (raw topography)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adt_swot_extra,
             label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adt_duacs,
             label='DUACS (raw topography)', color=swot_color_total, zorder=2)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 2: Raw dynamic topography comparison')
    plt.show()

6.3.3 Centered Anomalies Comparison¶

This visualization focuses exclusively on the centered anomalies of dynamic topography from MVP, SWOT, and DUACS, allowing for a direct comparison of spatial patterns without the influence of different reference levels. By centering each dataset (subtracting its mean value), we can more clearly assess how well the satellite measurements capture the relative variations in sea surface height.

The plot contains three lines representing centered anomalies:

  1. MVP centered dynamic height (dhc, red line):

    • Derived from in-situ measurements and centered by subtracting the mean
    • Represents the true local variations in dynamic height relative to the transect average
    • Serves as our ground-truth reference
  2. SWOT centered ADT (adtc_swot_extra, blue line):

    • SWOT ADT with its mean subtracted
    • Highlights the satellite's ability to capture spatial variations in sea surface height
  3. DUACS centered ADT (adtc_duacs, green line):

    • Centered multi-mission altimetry data
    • Provides a comparison with the established standard in satellite altimetry

With the centering operation, all three curves should theoretically align if they perfectly capture the same oceanographic features. The degree of alignment therefore indicates the quality of the satellite measurements relative to in-situ observations.

Key aspects to analyze in this figure include:

  • Amplitude agreement: Do the satellite products capture the same magnitude of variations as the in-situ measurements?
  • Phase alignment: Do peaks and troughs occur at the same locations along the transect?
  • Fine-scale features: Are smaller-scale variations consistently detected across all datasets?
  • SWOT vs. DUACS performance: Does the new SWOT mission show improved agreement with in-situ data compared to the established DUACS product?

This centered comparison is particularly valuable for evaluating SWOT's ability to resolve mesoscale and submesoscale oceanographic features, which is one of the primary scientific objectives of the mission. It helps quantify how well SWOT captures the spatial gradients that drive geostrophic currents, independent of absolute calibration issues.

In [ ]:
### 6.3.3 Centered anomalies comparison
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dhc'],
             label='MVP (centered adt)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adtc_swot_extra,
             label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adtc_duacs,
             label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 3: Centered anomalies comparison')
    plt.show()

6.3.4 Subplots – Centered Anomalies & Differences (I)¶

This multi-panel figure provides a comprehensive comparison of dynamic topography measurements by combining the centered anomalies in the upper panel with their pairwise differences in the lower panel. This approach enables both qualitative observation of patterns and quantitative assessment of discrepancies.

The figure contains two vertically stacked panels sharing the same x-axis (distance along transect):

Upper Panel: Centered Anomalies¶

Similar to the previous plot, this panel displays the centered dynamic topography anomalies from three sources:

  • MVP (red line): In-situ reference measurement
  • SWOT (blue line): New wide-swath altimetry
  • DUACS (green line): Established multi-mission altimetry

A horizontal gray line at y=0 provides a reference baseline.

Lower Panel: Pairwise Differences¶

This panel displays three difference curves, quantifying the discrepancies between datasets:

  1. SWOT - MVP (blue line):

    • Directly quantifies the error in SWOT measurements relative to in-situ data
    • Positive values indicate SWOT overestimation; negative values indicate underestimation
  2. DUACS - MVP (green line):

    • Shows the error in the established DUACS product relative to in-situ data
    • Provides context for evaluating SWOT's performance
  3. SWOT - DUACS (orange line):

    • Reveals differences between the two satellite products
    • Helps identify whether certain discrepancies are specific to SWOT or common to satellite altimetry in general

Both panels have consistent y-axis limits to facilitate comparison, with anomalies ranging from -0.08 to 0.08 meters and differences ranging from -0.06 to 0.06 meters.

This visualization enables several key analyses:

  • Error magnitude: The lower panel directly shows the magnitude of errors in satellite measurements, typically within a few centimeters
  • Systematic biases: Consistent positive or negative differences would indicate systematic biases in the satellite data
  • Location-specific errors: Points along the transect where differences spike may indicate features that are poorly resolved by satellite altimetry
  • Comparative performance: Comparing the blue and green lines reveals whether SWOT demonstrates improved agreement with in-situ data compared to DUACS

This combined visualization provides an optimal framework for comprehensive validation of SWOT measurements against both in-situ data and established satellite products.

In [ ]:
### 6.3.4 Subplots – centred anomalies & differences (I)
    fig, axs = plt.subplots(2, 1, sharex=True)
    axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
                label='MVP (centered adt)', color=mvp_color, zorder=5)
    axs[0].plot(df_mvp['dist'], adtc_swot_extra,
                label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    axs[0].plot(df_mvp['dist'], adtc_duacs,
                label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
    axs[0].legend(loc='lower right', fontsize=7)
    axs[0].set_ylim(-0.08, 0.08)

    axs[1].plot(df_mvp['dist'], adtc_swot_extra - df_mvp['dhc'],
                label='SWOT - MVP', color=swot_color_adt, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_duacs - df_mvp['dhc'],
                label='DUACS - MVP', color=swot_color_total, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
                label='SWOT - DUACS', color='orange', zorder=3)
    axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[1].set_xlabel('Distance (km)')
    axs[1].set_ylabel('Measurement difference (m)')
    axs[1].legend(loc='lower right', fontsize=7)
    axs[1].set_ylim(-0.06, 0.06)
    plt.tight_layout()
    plt.suptitle(
        f'PL{PL} - Extra Plot 4: Subplots – Centered anomalies & differences (I)',
        y=1.02
    )
    plt.show()

6.3.5 Subplots – Centered Anomalies & Differences (II)¶

This figure provides an alternative perspective on the differences between measurements by inverting two of the difference calculations in the lower panel. While the upper panel remains identical to the previous figure, the lower panel now shows differences from MVP's perspective rather than from the satellite products' perspective.

Upper Panel: Centered Anomalies¶

As in the previous figure, this panel displays the centered dynamic topography anomalies from three sources:

  • MVP (red line): In-situ reference measurement
  • SWOT (blue line): New wide-swath altimetry
  • DUACS (green line): Established multi-mission altimetry

Lower Panel: Reversed Pairwise Differences¶

The key distinction in this figure is that two of the difference calculations have been inverted:

  1. MVP - SWOT (blue line):

    • Now calculated as df_mvp['dhc'] - adtc_swot_extra (reversed from previous figure)
    • Positive values now indicate MVP measuring higher than SWOT
    • Frames the comparison from the perspective of the in-situ measurement
  2. MVP - DUACS (green line):

    • Now calculated as df_mvp['dhc'] - adtc_duacs (reversed from previous figure)
    • Shows how in-situ measurements differ from the established satellite product
  3. SWOT - DUACS (orange line):

    • Remains unchanged from the previous figure
    • Continues to show inter-satellite differences

This reversed perspective is valuable because:

  1. Reference framing: It positions the MVP as the reference measurement, which aligns with validation methodology where in-situ data is typically considered the ground truth.

  2. Error interpretation: It allows for more intuitive interpretation of where satellite products may be underestimating or overestimating dynamic topography relative to direct oceanographic measurements.

  3. Complementary view: When analyzed alongside the previous figure, it provides a complete picture of the relationships between the three datasets.

The consistent y-axis limits (-0.06 to 0.06 meters for differences) allow for direct comparison with the previous figure, making it easy to see that the blue and green lines are simply inverted while the orange line remains the same.

This alternative presentation of differences highlights the importance of considering the reference frame when validating satellite measurements and provides additional insights into the nature and magnitude of discrepancies between different measurement systems.

In [ ]:
### 6.3.5 Subplots – centred anomalies & differences (II)
    fig, axs = plt.subplots(2, 1, sharex=True)
    axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
                label='MVP (centered adt)', color=mvp_color, zorder=5)
    axs[0].plot(df_mvp['dist'], adtc_swot_extra,
                label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    axs[0].plot(df_mvp['dist'], adtc_duacs,
                label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
    axs[0].legend(loc='lower right', fontsize=7)
    axs[0].set_ylim(-0.08, 0.08)

    axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_swot_extra,
                label='MVP - SWOT', color=swot_color_adt, zorder=3)
    axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_duacs,
                label='MVP - DUACS', color=swot_color_total, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
                label='SWOT - DUACS', color='orange', zorder=3)
    axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[1].set_xlabel('Distance (km)')
    axs[1].set_ylabel('Measurement difference (m)')
    axs[1].legend(loc='lower right', fontsize=7)
    axs[1].set_ylim(-0.06, 0.06)
    plt.tight_layout()
    plt.suptitle(
        f'PL{PL} - Extra Plot 5: Subplots – Centered anomalies & differences (II)',
        y=1.02
    )
    plt.show()

Integral Code and Plots (I)¶

In [3]:
# 1. IMPORT LIBRARIES
## 1.1 Core scientific computing
import gsw                          # Gibbs Seawater toolbox
import numpy as np
import scipy.io as sio

## 1.2 Geospatial processing
import geopy.distance
from mpl_toolkits.basemap import Basemap

## 1.3 Visualisation tools
import matplotlib.pyplot as plt

## 1.4 Data handling and I/O
import pandas as pd
import glob
import os
from datetime import datetime, timedelta
import xarray as xr
import netCDF4 as nc

## 1.5 Interpolation utilities
from scipy.interpolate import griddata
from scipy.spatial import cKDTree, ConvexHull

## 1.6 Pattern matching and progress monitoring
import re
from tqdm import tqdm

## 1.7 Statistical diagnostics
import statsmodels.api as sm        # diagnostic plots

# 1.8 Colour palette for plots
swot_color_adt   = "blue"     # SWOT velocities derived from ADT
swot_color_total = "green"    # SWOT total velocities
swot_color_proj  = "#006400"  # SWOT projected velocities
mvp_color        = "red"      # MVP velocities


# 2. UTILITY FUNCTION DEFINITIONS
## 2.1 File loading and preprocessing
### 2.1.1 Generic loader for SSH and velocity fields
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load lon/lat, SSH, U, V and ADT fields from a NetCDF file,
    with optional geographic sub-setting.
    """
    if 'type' in kwargs:
        if kwargs['type'] == 'swot':
            filei2 = glob.glob(rep + fname)[0]
            ds = xr.open_dataset(filei2)
            if 'area' in kwargs:
                area = kwargs['area']
                selection = ((ds.longitude > area[0]) &
                             (ds.longitude < area[2]) &
                             (ds.latitude  > area[1]) &
                             (ds.latitude  < area[3]))
                ds = ds.where(selection)

            lons = ds.longitude.to_masked_array()
            lats = ds.latitude.to_masked_array()
            us   = ds.ugos.to_masked_array()
            vs   = ds.vgos.to_masked_array()
            ssha = ds.ssha_filtered.to_masked_array()
            mss  = ds.mss.to_masked_array()
            mdt  = ds.mdt.to_masked_array()

            # Remove rows that contain only NaNs
            mask_row = np.all(np.isnan(us), axis=1)
            u    = us[~mask_row, :]
            v    = vs[~mask_row, :]
            lon  = lons[~mask_row, :]
            lat  = lats[~mask_row, :]
            ssha = ssha[~mask_row, :]
            mss  = mss[~mask_row, :]
            mdt  = mdt[~mask_row, :]

            # Absolute Dynamic Topography (ADT)
            adt = ssha + mdt

        else:
            filei = glob.glob(rep + fname)[0]
            ds = xr.open_dataset(filei)
            lon  = ds[varn['longitude']][:]
            lat  = ds[varn['latitude']][:]
            ssha = ds[varn['ssh']][:]
            adt  = ds['adt'][:]
            u    = ds[varn['u']][:]
            v    = ds[varn['v']][:]
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon  = ds[varn['longitude']][:]
        lat  = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt  = ds['adt'][:]
        u    = ds[varn['u']][:]
        v    = ds[varn['v']][:]  
    return {
        'lon' : lon,
        'lat' : lat,
        'ssh' : ssha,
        'u'   : u,
        'v'   : v,
        'ssha': ssha,
        'adt' : adt
    }

### 2.1.2 Julian-day to calendar conversion
def jj2date(data, origin):
    """
    Convert fractional Julian days into calendar dates (YYYY-MM-DD hh:mm:ss).
    """
    data   = np.asarray(data)
    origin = pd.to_datetime(origin)
    times  = []
    for d in data:
        day = (origin + pd.to_timedelta(d, unit='D')).strftime('%Y-%m-%d')
        h   = int((d % 1) * 24)
        m   = int(((d % 1) * 24 % 1) * 60)
        s   = int((((d % 1) * 24 % 1) * 60 % 1) * 60)
        times.append(f"{day} {h:02d}:{m:02d}:{s:02d}")
    return np.array(times)

### 2.1.3 Recursive flattening of nested structures
def extract_values(nested_array):
    """
    Flatten arbitrarily nested arrays, tuples or lists into a single list.
    """
    values = []
    def recurse(arr):
        for element in arr:
            if isinstance(element, np.ndarray):
                if element.dtype == 'O':
                    recurse(element)
                else:
                    values.append(element.item())
            elif isinstance(element, tuple):
                for item in element:
                    recurse(item)
            else:
                values.append(element)
    recurse(nested_array)
    return values

## 2.2 Vector manipulation utilities
### 2.2.1 Planar rotation
def rotate_vector(u, v, angle_deg):
    """
    Rotate the vector (u, v) by angle_deg degrees.
    """
    angle_rad = np.deg2rad(angle_deg)
    u_rot = u * np.cos(angle_rad) - v * np.sin(angle_rad)
    v_rot = u * np.sin(angle_rad) + v * np.cos(angle_rad)
    return u_rot, v_rot

### 2.2.2 Moving average (centred window)
def moving_average(x, interval):
    """
    Compute a moving average of x over a centred window of length = interval.
    """
    start = (interval - 1) // 2
    liste_moyennes = []
    for i in range(start, len(x) - start):
        liste_moyennes.append(np.mean(x[i - start:i + start + 1]))
    return liste_moyennes

### 2.2.3 Projection of one vector onto another
def project_vector(uA, vA, uB, vB):
    """
    Project vector (uA, vA) onto the direction of (uB, vB).
    """
    normB = np.sqrt(uB**2 + vB**2)
    if normB == 0:
        return 0., 0.
    uB_hat, vB_hat = uB / normB, vB / normB
    scalar_proj = uA*uB_hat + vA*vB_hat
    return scalar_proj * uB_hat, scalar_proj * vB_hat

### 2.2.4 Date extraction from SWOT filenames
def extract_key_swot(filename):
    match = re.search(r'\d{8}T\d{6}', filename)
    if match:
        return datetime.strptime(match.group(0)[:8], '%Y%m%d').date()
    return None


# 3. SATELLITE AND AUXILIARY DATA HANDLING
## 3.1 SWOT and DUACS co-location
### 3.1.1 SWOT interpolation onto MVP positions
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
    """
    Load SWOT data and interpolate ADT onto MVP sampling points.
    """
    domain = [np.floor(np.min(mvp['lon'])) - 0.5,
              np.floor(np.min(mvp['lat'])) - 0.5,
              np.ceil(np.max(mvp['lon'])) + 0.5,
              np.ceil(np.max(mvp['lat'])) + 0.5]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
    fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
                      dayv_dt.strftime('%Y%m%d') + '*_*.nc')
    if not fname:
        print(f"No SWOT file found in load_SWOT for date {dayv}")
        return None
    swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
    points = np.array((swot['lon'].flatten(), swot['lat'].flatten())).T
    valuesadt = swot['adt'].flatten()
    adtc = griddata(points, valuesadt,
                    (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtc)

### 3.1.2 DUACS interpolation onto MVP positions
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
    """
    Load DUACS data and interpolate ADT onto MVP sampling points.
    """
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
    fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
                      dayv_dt.strftime('%Y%m%d') + '_*.nc')
    if not fname:
        print(f"No DUACS file found in load_DUACS for date {dayv}")
        return None
    duacs = loadsshuv(fname[0], '', varn)
    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    points = np.array((X.flatten(), Y.flatten())).T
    valuesadt = duacs['adt'].values.flatten()
    adtd = griddata(points, valuesadt,
                    (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtd)

## 3.2 SWOT file catalogue and acquisition dates
swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
                       'SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
swot_files.sort(key=extract_key_swot)
dates_swot = []
for files in swot_files:
    filename  = os.path.basename(files)
    date_strs = os.path.splitext(filename)[0].split('_')[7:8]
    for date_str in date_strs:
        dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())

## 3.3 Frontal region convex-hull computation
df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
region_f_df = df_f[df_f['Region'] == 'F']
coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
if len(coords_f) > 2:
    hull_f = ConvexHull(coords_f)
    polygon_points_f = coords_f[hull_f.vertices]
else:
    polygon_points_f = coords_f

### 3.3.1 Frontal hull plotting
def plot_fronte(m):
    """
    Draw the convex hull of the front in violet.
    """
    if len(polygon_points_f) < 2:
        return
    for i in range(len(polygon_points_f)):
        j = (i + 1) % len(polygon_points_f)
        lon1, lat1 = polygon_points_f[i]
        lon2, lat2 = polygon_points_f[j]
        x1, y1 = m(lon1, lat1)
        x2, y2 = m(lon2, lat2)
        style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
        if i == 0:
            m.plot([x1, x2], [y1, y2], color='purple',
                   linestyle=style, linewidth=2,
                   zorder=15, label='Front')
        else:
            m.plot([x1, x2], [y1, y2], color='purple',
                   linestyle=style, linewidth=2, zorder=15)


# 4. TRANSECT-BASED VELOCITY DIAGNOSTICS
## 4.1 Signed component perpendicular to transect
def compute_signed_speed_along_transect(u, v, lon, lat):
    """
    Compute the signed scalar component along the direction perpendicular
    to the transect defined by (lon, lat).
    """
    N_val = len(u)
    w_signed = np.zeros(N_val)
    for i in range(N_val):
        if N_val == 1:
            w_signed[i] = 0.0
            continue
        if i == 0:
            dx = lon[i+1] - lon[i]
            dy = lat[i+1] - lat[i]
        elif i == N_val-1:
            dx = lon[i] - lon[i-1]
            dy = lat[i] - lat[i-1]
        else:
            dx = lon[i+1] - lon[i-1]
            dy = lat[i+1] - lat[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        if norm_n < 1e-10:
            w_signed[i] = 0.0
            continue
        nx, ny = nx / norm_n, ny / norm_n
        w_signed[i] = (u[i]*nx + v[i]*ny)
    return w_signed

## 4.2 Signed full-speed magnitude
def compute_signed_speed_tot(u, v, lon, lat):
    """
    Return signed total speed: |(u, v)| with sign determined by the
    projection onto (nx, ny).
    """
    N_val = len(u)
    w_signed_tot = np.zeros(N_val)
    for i in range(N_val):
        if N_val == 1:
            w_signed_tot[i] = 0.0
            continue
        if i == 0:
            dx = lon[i+1] - lon[i]
            dy = lat[i+1] - lat[i]
        elif i == N_val-1:
            dx = lon[i] - lon[i-1]
            dy = lat[i] - lat[i-1]
        else:
            dx = lon[i+1] - lon[i-1]
            dy = lat[i+1] - lat[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        speed = np.sqrt(u[i]**2 + v[i]**2)
        if norm_n < 1e-10:
            w_signed_tot[i] = 0.0
            continue
        nx, ny = nx / norm_n, ny / norm_n
        sign_ = 1.0 if (u[i]*nx + v[i]*ny) >= 0 else -1.0
        w_signed_tot[i] = sign_ * speed
    return w_signed_tot

# 5. MVP TRANSECT PROCESSING LOOP
## 5.0 Preliminary set-up
PLs = np.array([5, 6, 7, 11, 12, 15, 17])

for PL in tqdm(PLs):

    # 5.1 LOAD MVP DATA AND COMPUTE DYNAMIC HEIGHT
    ## 5.1.1 Read .mat metadata and convert dates
    mat_file = sio.loadmat(
        '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
        f'MVP/L2/PL{PL}.mat'
    )
    date_extract = extract_values(mat_file['metadata']['date'])
    date_pd  = pd.DataFrame(date_extract)
    datejj   = date_pd[0]
    dates_mvp = [
        datetime.fromordinal(int(d[0][0])) +
        timedelta(days=d[0][0] % 1) -
        timedelta(days=366)
        for d in datejj
    ]

    ## 5.1.2 Oceanographic variables and centred dynamic height anomaly
    mat = sio.loadmat(
        '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
        f'MVP/L2_python/PL{PL}_python.mat'
    )
    lon_mvp, lat_mvp = mat['lon'], mat['lat']
    p_mvp, t_mvp     = mat['pres'], mat['temp']
    SP_mvp           = mat['sal']

    SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
    CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
    g_ = 9.81  # gravitational acceleration (m s⁻²)

    dh = []
    for i in range(SA.shape[0]):
        dh_i = (
            gsw.geo_strf_dyn_height(
                SA[i][~np.isnan(SA[i])],
                CT[i][~np.isnan(SA[i])],
                p_mvp[i][~np.isnan(SA[i])],
                p_ref=np.max(p_mvp[i][~np.isnan(SA[i])])
            )[0] / g_
        )
        dh.append(dh_i)
    dhmean = np.mean(dh)
    dhc = dh - dhmean  # centred anomaly

    df_mvp_raw = pd.DataFrame({
        'dh' : dh,
        'dhc': dhc,
        'lon': lon_mvp[0],
        'lat': lat_mvp[0]
    })

    # 5.2 DISTANCE METRIC AND REGULAR GRIDDING
    ## 5.2.1 Cumulative distance along transect
    dist_along_mvp = np.zeros(len(df_mvp_raw))
    for i in range(1, len(df_mvp_raw)):
        dist_along_mvp[i] = (
            dist_along_mvp[i-1] +
            geopy.distance.distance(
                (df_mvp_raw['lat'].values[i-1],
                 df_mvp_raw['lon'].values[i-1]),
                (df_mvp_raw['lat'].values[i],
                 df_mvp_raw['lon'].values[i])
            ).km
        )
    df_mvp_raw['dist'] = dist_along_mvp

    ## 5.2.2 Interpolation to 1-km grid and 11-km smoothing
    step_km = 1
    window_size_km = 11
    dist_max = df_mvp_raw['dist'].iloc[-1]
    dist_reg = np.arange(0, dist_max + step_km, step_km)
    lon_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
    lat_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
    dh_reg   = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
    dhc_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])

    window_size = int(window_size_km / step_km)
    if window_size % 2 == 0:
        window_size += 1  # enforce odd window length

    dh_reg_smooth = (
        pd.Series(dh_reg)
        .rolling(window=window_size, center=True)
        .mean()
        .values
    )
    dhc_reg_smooth = (
        pd.Series(dhc_reg)
        .rolling(window=window_size, center=True)
        .mean()
        .values
    )

    df_mvp = pd.DataFrame({
        'dist'    : dist_reg,
        'lon'     : lon_reg,
        'lat'     : lat_reg,
        'dh_raw'  : dh_reg,
        'dhc_raw' : dhc_reg,
        'dh'      : dh_reg_smooth,
        'dhc'     : dhc_reg_smooth
    }).dropna(subset=['dh', 'dhc']).copy()

    # 5.3 GEOSTROPHIC VELOCITY FROM MVP
    w_mvp_smooth = gsw.geostrophy.geostrophic_velocity(
        df_mvp['dh'].values,
        df_mvp['lon'].values,
        df_mvp['lat'].values
    )
    N_mvp = len(w_mvp_smooth[0])
    df_mvp = df_mvp.iloc[:N_mvp].copy()

    u_mvp = np.zeros(N_mvp)
    v_mvp = np.zeros(N_mvp)
    for i in range(N_mvp):
        if N_mvp > 1:
            if i == 0:
                dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i]
                dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i]
            elif i == N_mvp-1:
                dx = w_mvp_smooth[1][i] - w_mvp_smooth[1][i-1]
                dy = w_mvp_smooth[2][i] - w_mvp_smooth[2][i-1]
            else:
                dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i-1]
                dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i-1]
            angle = np.arctan2(dx, dy)
            u_mvp[i] = - w_mvp_smooth[0][i] * g_ * np.cos(angle)
            v_mvp[i] =   w_mvp_smooth[0][i] * g_ * np.sin(angle)

    df_mvp['u_mvp'] = u_mvp
    df_mvp['v_mvp'] = v_mvp

    # 5.4 MATCH MVP DATES TO SWOT
    ## 5.4.1 Identify corresponding SWOT file
    dayv = None
    for dsw in dates_swot:
        if dsw == dates_mvp[0].date():
            dayv = dsw.strftime('%Y-%m-%d')
            break
    if not dayv:
        print(f"No matching (SWOT) date for PL{PL}")
        continue

    domain = [3, 38.5, 6.5, 44]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn_swot = {
        'longitude': 'longitude',
        'latitude' : 'latitude',
        'u'        : 'ugos',
        'v'        : 'vgos',
        'adt'      : 'adt'
    }
    fname = glob.glob(
        '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
        f'SWOT_L3_LR_SSH_Expert_*{dayv_dt.strftime("%Y%m%d")}*_*.nc'
    )
    if not fname:
        print(f"No SWOT file found for date: {dayv} - PL{PL}")
        continue

    swot = loadsshuv(fname[0], '', varn_swot, type='swot', area=domain)
    adt_flat_swot = swot['adt'].flatten()
    lon_flat_swot = swot['lon'].flatten()
    lat_flat_swot = swot['lat'].flatten()
    u_swot_tot    = swot['u'].flatten()
    v_swot_tot    = swot['v'].flatten()

    adtmean_swot = np.nanmean(adt_flat_swot)
    adtc_swot = adt_flat_swot - adtmean_swot

    # 5.4.2 Nearest-neighbour search to MVP transect
    valid_mask = ~np.isnan(lon_flat_swot) & ~np.isnan(lat_flat_swot)
    lon_valid = lon_flat_swot[valid_mask]
    lat_valid = lat_flat_swot[valid_mask]
    u_valid   = u_swot_tot[valid_mask]
    v_valid   = v_swot_tot[valid_mask]
    adt_valid = adt_flat_swot[valid_mask]

    tree = cKDTree(np.c_[lon_valid, lat_valid])
    _, indices = tree.query(
        np.c_[df_mvp['lon'].values, df_mvp['lat'].values], k=1
    )
    u_swot = u_valid[indices]
    v_swot = v_valid[indices]
    adt_on_transect = adt_valid[indices]

    # 5.5 DIAGNOSTIC SCALING AND SWOT-FROM-ADT COMPUTATION
    coeff_scaled = 100000
    u_mvp_scaled = df_mvp['u_mvp'].values * coeff_scaled
    v_mvp_scaled = df_mvp['v_mvp'].values * coeff_scaled

    ## 5.5.1 Geostrophic velocity from interpolated SWOT ADT
    adt_swot_interp = griddata(
        (lon_valid, lat_valid),
        adt_valid,
        (df_mvp['lon'].values, df_mvp['lat'].values),
        method='linear'
    )
    w_swot_calc_raw = gsw.geostrophy.geostrophic_velocity(
        adt_swot_interp,
        df_mvp['lon'].values,
        df_mvp['lat'].values
    )
    N2 = min(len(w_swot_calc_raw[0]),
             len(w_swot_calc_raw[1]),
             len(w_swot_calc_raw[2]))
    u_swot_calc = np.zeros(N2)
    v_swot_calc = np.zeros(N2)
    for i in range(N2):
        if N2 == 1:
            dx2 = dy2 = 0
        elif i == 0:
            dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i]
            dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i]
        elif i == N2-1:
            dx2 = w_swot_calc_raw[1][i] - w_swot_calc_raw[1][i-1]
            dy2 = w_swot_calc_raw[2][i] - w_swot_calc_raw[2][i-1]
        else:
            dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i-1]
            dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i-1]
        angle2 = np.arctan2(dx2, dy2)
        u_swot_calc[i] = - w_swot_calc_raw[0][i] * g_ * np.cos(angle2)
        v_swot_calc[i] =   w_swot_calc_raw[0][i] * g_ * np.sin(angle2)

    scale_calc = 100000
    u_swot_calc_scaled = u_swot_calc * scale_calc
    v_swot_calc_scaled = v_swot_calc * scale_calc
 # 6. VISUALISATION AND COMPARATIVE ANALYSIS
    ## 6.1 Map-based diagnostics
    ### 6.1.1 ADT-based velocity vs MVP
    min_len_calc = min(N2, len(df_mvp))
    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_swot_map, y_swot_map = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot_map, y_swot_map,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )

    x_pl_map_1, y_pl_map_1 = m(
        df_mvp['lon'].values[:min_len_calc],
        df_mvp['lat'].values[:min_len_calc]
    )
    m.scatter(x_pl_map_1, y_pl_map_1, color='black', label=f'PL{PL}', zorder=5, s=10)

    q1 = m.quiver(
        x_pl_map_1, y_pl_map_1,
        u_swot_calc_scaled[:min_len_calc],
        v_swot_calc_scaled[:min_len_calc],
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_adt, zorder=5,
        label='SWOT velocity (from ADT)'
    )
    m.quiver(
        x_pl_map_1, y_pl_map_1,
        u_mvp_scaled[:min_len_calc],
        v_mvp_scaled[:min_len_calc],
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )

    plt.quiverkey(q1, 0.80, 0.1, 0.10 * scale_calc, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m) [unsmoothed]')
    plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (1) ADT-based Velocity vs MVP')
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

    ### 6.1.2 Full SWOT velocity vs MVP
    min_len_uv = min(len(u_swot), len(df_mvp))
    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_pl, y_pl = m(
        df_mvp['lon'].values[:min_len_uv],
        df_mvp['lat'].values[:min_len_uv]
    )
    x_swot, y_swot = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot, y_swot,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )
    m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)

    u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
    v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled

    q2 = m.quiver(
        x_pl, y_pl,
        u_swot_scaled_tot, v_swot_scaled_tot,
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_total, zorder=5,
        label='SWOT velocity (TOTAL)'
    )
    m.quiver(
        x_pl, y_pl,
        u_mvp_scaled[:min_len_uv], v_mvp_scaled[:min_len_uv],
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )

    plt.quiverkey(q2, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m)')
    plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (2) FULL Velocity')
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

    ### 6.1.3 Projected SWOT velocity vs MVP
    w_swot_proj_sign_map = compute_signed_speed_along_transect(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    u_swot_proj_map = np.zeros(len(df_mvp))
    v_swot_proj_map = np.zeros(len(df_mvp))
    for i in range(len(df_mvp)):
        if len(df_mvp) == 1:
            dx = dy = 0
        elif i == 0:
            dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i]
            dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i]
        elif i == len(df_mvp) - 1:
            dx = df_mvp['lon'].values[i] - df_mvp['lon'].values[i-1]
            dy = df_mvp['lat'].values[i] - df_mvp['lat'].values[i-1]
        else:
            dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i-1]
            dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i-1]
        nx, ny = dy, -dx
        norm_n = np.sqrt(nx**2 + ny**2)
        if norm_n < 1e-10:
            u_swot_proj_map[i] = 0.
            v_swot_proj_map[i] = 0.
        else:
            nx, ny = nx / norm_n, ny / norm_n
            scalar_proj = w_swot_proj_sign_map[i]
            u_swot_proj_map[i] = scalar_proj * nx
            v_swot_proj_map[i] = scalar_proj * ny

    plt.figure(figsize=(12, 8))
    m = Basemap(
        projection='merc',
        llcrnrlat=df_mvp['lat'].min() - 0.4,
        urcrnrlat=df_mvp['lat'].max() + 0.4,
        llcrnrlon=df_mvp['lon'].min() - 0.4,
        urcrnrlon=df_mvp['lon'].max() + 0.4,
        resolution='i'
    )
    m.drawcoastlines()
    m.drawcountries()
    m.drawmapboundary(fill_color='lightblue')
    m.fillcontinents(color='beige', lake_color='lightblue')
    m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
    m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])

    x_swot, y_swot = m(lon_valid, lat_valid)
    sc = m.scatter(
        x_swot, y_swot,
        c=adt_valid - np.nanmean(adt_valid),
        cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
    )
    x_pl, y_pl = m(df_mvp['lon'].values, df_mvp['lat'].values)
    m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)

    u_swot_proj_scaled_map = u_swot_proj_map * coeff_scaled
    v_swot_proj_scaled_map = v_swot_proj_map * coeff_scaled
    q3 = m.quiver(
        x_pl, y_pl,
        u_swot_proj_scaled_map, v_swot_proj_scaled_map,
        angles='xy', scale_units='xy', scale=1,
        color=swot_color_proj, zorder=5,
        label='SWOT velocity (Projected)'
    )
    m.quiver(
        x_pl, y_pl,
        u_mvp_scaled, v_mvp_scaled,
        angles='xy', scale_units='xy', scale=1,
        color=mvp_color, zorder=10, label='MVP velocity'
    )
    plt.quiverkey(q3, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
                  labelpos='E', coordinates='axes', color='black')
    plt.colorbar(sc, label='SWOT adt (m)')
    plt.title(
        f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (3) PROJECTED Velocity (New Method)'
    )
    plot_fronte(m)
    plt.legend(loc='upper right')
    plt.show()

    ## 6.2 One-dimensional signed velocity comparison
    ### 6.2.1 Signed velocity profiles
    w_swot_tot_sign = - compute_signed_speed_tot(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_swot_proj_sign = - compute_signed_speed_along_transect(
        u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_mvp_sign = - compute_signed_speed_along_transect(
        df_mvp['u_mvp'].values, df_mvp['v_mvp'].values,
        df_mvp['lon'].values, df_mvp['lat'].values
    )
    w_swot_calc_sign = - compute_signed_speed_along_transect(
        u_swot_calc, v_swot_calc,
        df_mvp['lon'].values[:N2], df_mvp['lat'].values[:N2]
    )

    duacsdir = '/Volumes/Extreme SSD/MIO/DUACS/'
    savefig  = ''

    # DUACS reference velocity
    adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
    if adt_duacs is not None:
        w_duacs_raw = gsw.geostrophy.geostrophic_velocity(
            adt_duacs, df_mvp['lon'].values, df_mvp['lat'].values
        )
        N_duacs = len(w_duacs_raw[0])
        u_duacs = np.zeros(N_duacs)
        v_duacs = np.zeros(N_duacs)
        for i in range(N_duacs):
            if N_duacs == 1:
                dx_d = dy_d = 0
            elif i == 0:
                dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i]
                dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i]
            elif i == N_duacs - 1:
                dx_d = w_duacs_raw[1][i] - w_duacs_raw[1][i-1]
                dy_d = w_duacs_raw[2][i] - w_duacs_raw[2][i-1]
            else:
                dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i-1]
                dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i-1]
            angle_d = np.arctan2(dx_d, dy_d)
            u_duacs[i] = - w_duacs_raw[0][i] * g_ * np.cos(angle_d)
            v_duacs[i] =   w_duacs_raw[0][i] * g_ * np.sin(angle_d)
        w_duacs_sign_full = - compute_signed_speed_along_transect(
            u_duacs, v_duacs,
            df_mvp['lon'].values, df_mvp['lat'].values
        )
    else:
        w_duacs_sign_full = np.full(len(df_mvp), np.nan)

    min_len_4 = min(
        len(w_mvp_sign), len(w_swot_tot_sign), len(w_swot_proj_sign),
        len(w_swot_calc_sign), len(w_duacs_sign_full)
    )
    x_4              = df_mvp['dist'].values[:min_len_4]
    w_mvp_4          = w_mvp_sign[:min_len_4]
    w_swot_tot_4     = w_swot_tot_sign[:min_len_4]
    w_swot_proj_4    = w_swot_proj_sign[:min_len_4]
    w_swot_calc_4    = w_swot_calc_sign[:min_len_4]
    w_duacs_4        = w_duacs_sign_full[:min_len_4]

    duacs_color = "magenta"

    plt.figure()
    plt.plot(x_4, w_mvp_4,          'o-', color=mvp_color,        label='MVP (signed)')
    plt.plot(x_4, w_swot_tot_4,     's-', color=swot_color_total, label='SWOT TOTAL (signed)')
    plt.plot(x_4, w_swot_proj_4,    '^-', color=swot_color_proj,  label='SWOT Projected (signed)')
    plt.plot(x_4, w_swot_calc_4,    'd-', color=swot_color_adt,   label='SWOT from ADT (signed)')
    plt.plot(x_4, w_duacs_4,        'p-', color=duacs_color,      label='DUACS (signed)')
    plt.grid(True)
    plt.legend()
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Velocity (m/s) [signed]')
    plt.title(
        f'PL{PL} - (4) Signed Velocity Comparison '
        '(MVP, TOTAL, Projected, ADT-based, DUACS)'
    )
    plt.show()
    ## 6.3 Extra diagnostic plots
    swotdir = '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
    savefig = '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/figures'

    adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
    if adt_swot_extra is None:
        continue
    adtmean_swot_extra = np.nanmean(adt_swot_extra)
    adtc_swot_extra = adt_swot_extra - adtmean_swot_extra

    adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
    if adt_duacs is None:
        continue
    adtmean_duacs = np.nanmean(adt_duacs)
    adtc_duacs = adt_duacs - adtmean_duacs

    # Trim satellite arrays to MVP transect length
    L_full = len(df_mvp)
    adt_swot_extra  = adt_swot_extra[:L_full]
    adtc_swot_extra = adtc_swot_extra[:L_full]
    adt_duacs       = adt_duacs[:L_full]
    adtc_duacs      = adtc_duacs[:L_full]

    ### 6.3.1 Centred anomalies and raw topography
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dhc'],
             label='MVP (centered adt)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], df_mvp['dh'],
             '--', label='MVP (raw topography)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adtc_swot_extra,
             label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adt_swot_extra,
             '--', label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adtc_duacs,
             label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    plt.plot(df_mvp['dist'], adt_duacs,
             '--', label='DUACS (raw topography)', color=swot_color_total, zorder=2)
    plt.plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
             '-', color='grey', zorder=1)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 1: Centered anomalies vs raw topography')
    plt.show()

    ### 6.3.2 Raw topography comparison
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dh'],
             label='MVP (raw topography)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adt_swot_extra,
             label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adt_duacs,
             label='DUACS (raw topography)', color=swot_color_total, zorder=2)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 2: Raw dynamic topography comparison')
    plt.show()

    ### 6.3.3 Centered anomalies comparison
    plt.figure()
    plt.plot(df_mvp['dist'], df_mvp['dhc'],
             label='MVP (centered adt)', color=mvp_color, zorder=5)
    plt.plot(df_mvp['dist'], adtc_swot_extra,
             label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    plt.plot(df_mvp['dist'], adtc_duacs,
             label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    plt.xlabel('Distance along transect (km)')
    plt.ylabel('Dynamic Topography Anomaly (m)')
    plt.legend(loc='lower right', fontsize=7)
    plt.title(f'PL{PL} - Extra Plot 3: Centered anomalies comparison')
    plt.show()

    ### 6.3.4 Subplots – centred anomalies & differences (I)
    fig, axs = plt.subplots(2, 1, sharex=True)
    axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
                label='MVP (centered adt)', color=mvp_color, zorder=5)
    axs[0].plot(df_mvp['dist'], adtc_swot_extra,
                label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    axs[0].plot(df_mvp['dist'], adtc_duacs,
                label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
    axs[0].legend(loc='lower right', fontsize=7)
    axs[0].set_ylim(-0.08, 0.08)

    axs[1].plot(df_mvp['dist'], adtc_swot_extra - df_mvp['dhc'],
                label='SWOT - MVP', color=swot_color_adt, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_duacs - df_mvp['dhc'],
                label='DUACS - MVP', color=swot_color_total, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
                label='SWOT - DUACS', color='orange', zorder=3)
    axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[1].set_xlabel('Distance (km)')
    axs[1].set_ylabel('Measurement difference (m)')
    axs[1].legend(loc='lower right', fontsize=7)
    axs[1].set_ylim(-0.06, 0.06)
    plt.tight_layout()
    plt.suptitle(
        f'PL{PL} - Extra Plot 4: Subplots – Centered anomalies & differences (I)',
        y=1.02
    )
    plt.show()

    ### 6.3.5 Subplots – centred anomalies & differences (II)
    fig, axs = plt.subplots(2, 1, sharex=True)
    axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
                label='MVP (centered adt)', color=mvp_color, zorder=5)
    axs[0].plot(df_mvp['dist'], adtc_swot_extra,
                label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
    axs[0].plot(df_mvp['dist'], adtc_duacs,
                label='DUACS (centered adt)', color=swot_color_total, zorder=2)
    axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
    axs[0].legend(loc='lower right', fontsize=7)
    axs[0].set_ylim(-0.08, 0.08)

    axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_swot_extra,
                label='MVP - SWOT', color=swot_color_adt, zorder=3)
    axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_duacs,
                label='MVP - DUACS', color=swot_color_total, zorder=3)
    axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
                label='SWOT - DUACS', color='orange', zorder=3)
    axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
                '-', color='grey', zorder=1)
    axs[1].set_xlabel('Distance (km)')
    axs[1].set_ylabel('Measurement difference (m)')
    axs[1].legend(loc='lower right', fontsize=7)
    axs[1].set_ylim(-0.06, 0.06)
    plt.tight_layout()
    plt.suptitle(
        f'PL{PL} - Extra Plot 5: Subplots – Centered anomalies & differences (II)',
        y=1.02
    )
    plt.show()
  0%|                                                     | 0/7 [00:00<?, ?it/s]
 14%|██████▍                                      | 1/7 [00:15<01:31, 15.30s/it]
 29%|████████████▊                                | 2/7 [00:29<01:13, 14.74s/it]
 43%|███████████████████▎                         | 3/7 [01:00<01:27, 21.98s/it]
 57%|█████████████████████████▋                   | 4/7 [01:31<01:16, 25.61s/it]
 71%|████████████████████████████████▏            | 5/7 [02:02<00:55, 27.52s/it]
 86%|██████████████████████████████████████▌      | 6/7 [02:17<00:23, 23.16s/it]
100%|█████████████████████████████████████████████| 7/7 [02:48<00:00, 24.02s/it]

2º Code - Reference Level Optimization for SWOT-MVP Cross-Validation¶

This second section of the notebook implements a systematic methodology for optimizing the reference pressure level used in dynamic height calculations. This optimization is crucial for accurate comparison between in-situ MVP measurements and satellite altimetry data.

Dynamic height calculations from oceanographic profiles require a reference "level of no motion" - a depth at which horizontal flow is assumed negligible. Traditionally, the deepest measurement depth is used, but this approach may not yield optimal results for satellite validation. This code implements a data-driven approach to identify the reference depth that maximizes agreement between MVP and SWOT measurements.

This optimization framework:

  1. Defines a utility function that processes MVP data using different candidate reference levels
  2. Implements a parameter sweep across reference depths from 180 to 360 meters
  3. Calculates correlation metrics between MVP and SWOT for each candidate depth
  4. Identifies the optimal reference level that maximizes this correlation
  5. Visualizes the sensitivity of the correlation to different reference depths

The analysis integrates data from all available transects (PL5-PL24) to ensure the optimization is robust and not biased by individual profiles. This aggregated approach provides a comprehensive assessment of how reference level choice affects the agreement between in-situ and satellite measurements.

Understanding the optimal reference level has important implications beyond validation, providing insight into the vertical structure of ocean circulation in the Northwestern Mediterranean and improving our ability to accurately estimate geostrophic velocities from density profiles.

1. Import Libraries¶

1.1 External and Standard Modules¶

This section imports the necessary Python libraries for our reference level optimization analysis:

  • numpy and pandas: Core data manipulation libraries for numerical operations and structured data handling
  • matplotlib.pyplot: Visualization library for creating plots of the sensitivity analysis results
  • gsw: Gibbs SeaWater toolbox for oceanographic calculations following the TEOS-10 standard
  • scipy.io and scipy.spatial: For loading MATLAB files and implementing spatial data structures (cKDTree)
  • scipy.stats: Provides the Pearson correlation function used in our validation metrics
  • xarray: Library for working with labeled multi-dimensional arrays, particularly NetCDF files
  • geopy.distance: For calculating geodesic distances along MVP transects
  • glob and os: File operation utilities for locating SWOT and MVP data files
  • datetime: For handling time information and matching satellite passes with MVP transects
  • tqdm: Progress bar library to monitor the execution of our parameter sweep

This analysis pipeline builds on the previous validation framework but focuses specifically on optimizing the reference pressure level used for dynamic height calculations. By systematically testing different reference depths, we can identify which level produces the best agreement between MVP and SWOT measurements across all available transects.

In [ ]:
## 1.1 External and standard modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob, os
from datetime import datetime, timedelta
import gsw
import scipy.io as sio
import geopy.distance
from scipy.spatial import cKDTree
from scipy.stats import pearsonr
import xarray as xr
from scipy.interpolate import griddata
from tqdm import tqdm

2. Auxiliary Functions¶

2.1 Scalar and Value Extraction¶

When working with oceanographic data formats, particularly MATLAB files loaded into Python via scipy.io, we often encounter deeply nested data structures that need to be flattened and simplified. The two functions in this section address this challenge:

get_scalar: Single-Value Extraction¶

This utility function recursively traverses nested data structures to extract a single scalar value. This is particularly useful for extracting individual metadata elements like timestamps or specific measurement values from complex data structures.

The function handles different container types (lists, tuples, NumPy arrays) and applies appropriate extraction methods based on the container type:

  • For NumPy arrays with only a single element, it uses the .item() method
  • For larger arrays, it takes the first element and continues processing
  • For other iterables, it extracts the first element

extract_values: Comprehensive Flattening¶

For more complex nested structures, the extract_values function performs a complete flattening, extracting all primitive values into a single list. It employs a recursive approach:

  1. It defines an inner function recurse that processes each element
  2. For NumPy object arrays (dtype='O'), it recurses into the array
  3. For regular NumPy arrays, it extracts the scalar value
  4. For tuples, it processes each item individually
  5. For all other values, it appends them directly to the output list

These functions are essential when processing the MVP data files, which contain metadata like date information and oceanographic measurements stored in complex nested structures. By efficiently extracting and organizing these values, we can proceed with the analysis without being hindered by the original data format.

In [ ]:
## 2.1 Scalar and value extraction
def get_scalar(val):
    """Extract a scalar from a possibly nested iterable."""
    while isinstance(val, (list, tuple, np.ndarray)):
        if isinstance(val, np.ndarray):
            if val.size == 1:
                val = val.item()
            else:
                val = val[0]
        else:
            val = val[0]
    return val


def extract_values(nested_array):
    """Flatten a nested array and return the values as a list."""
    values = []

    def recurse(arr):
        for element in arr:
            if isinstance(element, np.ndarray):
                if element.dtype == 'O':
                    recurse(element)
                else:
                    values.append(element.item())
            elif isinstance(element, tuple):
                for item in element:
                    recurse(item)
            else:
                values.append(element)

    recurse(nested_array)
    return values

2.2 Data Loading Utilities¶

The loadsshuv function provides a standardized interface for loading satellite altimetry data from NetCDF files, with special handling for SWOT data products. Similar to the implementation in the previous code section, this function ensures consistent access to oceanographic variables across different data sources.

Function Details¶

loadsshuv(fname, rep, varn, **kwargs)

This function handles two primary data loading scenarios based on the source type:

  1. SWOT data - When kwargs['type'] == 'swot':

    • Applies geographic filtering based on the specified area
    • Converts xarray DataArrays to NumPy masked arrays
    • Removes rows containing only NaN values
    • Computes Absolute Dynamic Topography (ADT) from Sea Surface Height Anomaly (SSHA) and Mean Dynamic Topography (MDT)
  2. Other altimetry products (e.g., DUACS):

    • Uses a more generic approach with variable names provided in the varn dictionary

The function returns a consistent dictionary of oceanographic variables regardless of the source, enabling seamless comparison between different altimetry products. This standardization is crucial for the parameter optimization analysis, as it allows us to apply the same processing pipeline to different datasets.

The returned variables include:

  • Geographical coordinates (lon, lat)
  • Sea surface height components (ssh, ssha)
  • Geostrophic velocity components (u, v)
  • Absolute Dynamic Topography (adt)

This uniform data structure facilitates the consistent application of validation metrics across different reference level candidates in our sensitivity analysis.

In [ ]:
##2.2 Data loading utilities
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load sea-surface height and velocity data from a netCDF file.
    """
    if 'type' in kwargs and kwargs['type'] == 'swot':
        filei2 = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei2)
        if 'area' in kwargs:
            area = kwargs['area']
            selection = ((ds.longitude > area[0]) &
                         (ds.longitude < area[2]) &
                         (ds.latitude > area[1]) &
                         (ds.latitude < area[3]))
            ds = ds.where(selection)
        lons = ds.longitude.to_masked_array()
        lats = ds.latitude.to_masked_array()
        us = ds.ugos.to_masked_array()
        vs = ds.vgos.to_masked_array()
        ssha = ds.ssha_filtered.to_masked_array()
        mss = ds.mss.to_masked_array()
        mdt = ds.mdt.to_masked_array()
        mask_row = np.all(np.isnan(us), axis=1)
        u = us[~mask_row, :]
        v = vs[~mask_row, :]
        lon = lons[~mask_row, :]
        lat = lats[~mask_row, :]
        ssha = ssha[~mask_row, :]
        mss = mss[~mask_row, :]
        mdt = mdt[~mask_row, :]
        adt = ssha + mdt
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon = ds[varn['longitude']][:]
        lat = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt = ds['adt'][:]
        u = ds[varn['u']][:]
        v = ds[varn['v']][:]
    return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
            'ssha': ssha, 'adt': adt}

2.3 Match-up Computation¶

The calc_matchup_for_PL function is the core component of our sensitivity analysis, performing a complete end-to-end processing pipeline for comparing MVP and SWOT measurements along a single transect (PL). This function accepts an optional candidate parameter that specifies the reference pressure level to use for dynamic height calculations.

Function Details¶

calc_matchup_for_PL(PL, candidate=None)

Parameters:

  • PL: The profile (plongée) number to process
  • candidate: Optional reference pressure level (depth) for dynamic height calculation

The function follows a comprehensive four-part processing workflow:

  1. MVP Data Processing and Dynamic Height Computation

    • Loads MVP data from MATLAB files
    • Extracts dates, location coordinates, and oceanographic measurements
    • Converts practical salinity to absolute salinity and in-situ temperature to conservative temperature
    • Computes dynamic height using either the maximum pressure as reference (p_ref) or the specified candidate value
  2. Transect Construction and Cumulative Distance Calculation

    • Creates a pandas DataFrame with dynamic height values and coordinates
    • Computes cumulative distance along the transect using geodesic calculations
  3. Regular-Grid Interpolation and Smoothing

    • Interpolates measurements to a regular 1 km grid
    • Applies an 11 km smoothing window to filter high-frequency variations
    • Handles edge effects through forward and backward filling
  4. SWOT Co-location and Correlation

    • Identifies and loads the SWOT data for the corresponding date
    • Uses a k-d tree for efficient nearest-neighbor matching of SWOT data to MVP points
    • Computes centered anomalies for both datasets
    • Returns paired (MVP, SWOT) values for correlation analysis

The function includes comprehensive error handling, returning None, None in case of failures at any processing stage.

This function's ability to accept different reference pressure levels makes it ideal for our sensitivity analysis. By varying the candidate parameter, we can evaluate how different reference depths affect the agreement between MVP and SWOT measurements across all available transects.

In [ ]:
## 2.3 Match-up computation
def calc_matchup_for_PL(PL, candidate=None):
    """
    Compute the MVP–SWOT dynamic-height match-up for a single profile.
    """
    try:

Part 1: MVP Processing and Dynamic Height Computation¶

This section handles the loading and processing of Moving Vessel Profiler (MVP) data to compute dynamic height. The dynamic height is a fundamental oceanographic quantity that represents the vertical integral of density anomalies, providing a measure of pressure gradients that drive geostrophic flow.

The process starts by loading two MATLAB files:

  • The first contains metadata, particularly the timestamps of the measurements
  • The second contains the actual oceanographic measurements (temperature, salinity, pressure)

For timestamp processing, the code:

  • Extracts dates using the helper functions defined earlier
  • Converts from MATLAB's datenum format to Python datetime objects
  • Accounts for the 366-day offset between MATLAB and Python date origins

The oceanographic measurements are then processed following the TEOS-10 standard:

  1. Variable conversion: Converting Practical Salinity (SP) to Absolute Salinity (SA) and in-situ temperature to Conservative Temperature (CT)
  2. Dynamic height calculation: Computing the geostrophic streamfunction relative to a reference pressure

The key physical concept here is the dynamic height calculation, which is based on the vertical integration of specific volume anomaly:

$$\Delta D = \int_{p_{\text{ref}}}^{p} \delta(S, T, p') \, dp'$$

Where:

  • $\Delta D$ is the dynamic height
  • $p_{\text{ref}}$ is the reference pressure level
  • $\delta$ is the specific volume anomaly

The choice of reference pressure ($p_{\text{ref}}$) is crucial and is the focus of our optimization:

  • If candidate is None, we use the maximum pressure for each cast
  • Otherwise, we use the specified candidate reference level

Finally, the function computes centered anomalies by subtracting the mean, allowing for comparison with satellite measurements independent of absolute reference levels.

In [ ]:
# Part 1 – MVP processing and dynamic-height computation
        mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_raw = mvp_mat['metadata']['date']
        date_extract = extract_values(date_raw)
        dates_mvp = []
        for d_val in date_extract:
            scalar_val = get_scalar(d_val)
            dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
            dates_mvp.append(dt)

        mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mvp_mat2['lon']
        lat_mvp = mvp_mat2['lat']
        p_mvp   = mvp_mat2['pres']
        t_mvp   = mvp_mat2['temp']
        SP_mvp  = mvp_mat2['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81

        dh = []
        for i in range(SA.shape[0]):
            valid = ~np.isnan(SA[i])
            if np.sum(valid) == 0:
                dh.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
            if np.isnan(p_ref):
                dh.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
                                           p_mvp[i][valid], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh.append(dh_i)
        dh = np.array(dh)
        if np.all(np.isnan(dh)):
            return None, None
        dh_mean = np.nanmean(dh)
        dhc = dh - dh_mean  # centred anomaly

Part 2: Transect Construction and Cumulative Distance¶

After computing dynamic height values, the code organizes the data into a structured format and calculates the along-track distance, which will serve as the independent variable for subsequent interpolation and analysis.

The process involves:

  1. Data organization: Creating a pandas DataFrame that contains:

    • Raw dynamic height (dh)
    • Centered dynamic height anomalies (dhc)
    • Geographic coordinates (longitude and latitude)
  2. Cumulative distance calculation: Computing the along-track distance by:

    • Initializing a zero array for distances
    • For each point, adding the geodesic distance from the previous point
    • Using geopy's distance function to account for Earth's curvature

The geodesic distance calculation is important because it provides an accurate measure of distance along the ship track, accounting for the fact that the Earth is not flat. This is particularly relevant in oceanographic surveys that may span significant distances.

The resulting distance metric serves multiple purposes:

  • It provides a natural coordinate system for the transect
  • It facilitates regular-grid interpolation in the next step
  • It enables consistent comparisons between different measurement systems

By organizing the data in this structured format with a well-defined distance metric, we create a foundation for the spatial processing operations that follow.

In [ ]:
# Part 2 – Transect construction and cumulative distance
        N = len(dhc)
        df_mvp_raw = pd.DataFrame({
            'dh': dh,
            'dhc': dhc,
            'lon': lon_mvp[0][:N],
            'lat': lat_mvp[0][:N]
        })
        dist = np.zeros(N)
        for i in range(1, N):
            dist[i] = dist[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
                (df_mvp_raw['lat'].iloc[i],   df_mvp_raw['lon'].iloc[i])
            ).km
        df_mvp_raw['dist'] = dist

Part 3: Regular-Grid Interpolation and Smoothing¶

This code segment performs two critical data preprocessing steps: spatial interpolation and smoothing of oceanographic measurements.

  1. Interpolation Parameters:

    • Sets a uniform 1 km spatial sampling (step_km)
    • Defines an 11 km smoothing window (window_size_km)
    • Creates a regularly spaced distance grid from 0 to maximum transect distance
  2. Linear Interpolation:

    • Uses NumPy's interp() to map original measurements onto a regular grid
    • Interpolates longitude, latitude, dynamic height, and dynamic height anomaly
    • Ensures consistent spatial resolution across different variables
  3. Smoothing Technique:

    • Applies a centered rolling window mean using pandas rolling() method
    • Handles edge cases with bfill() and ffill() to manage data boundaries
    • Enforces an odd window size to maintain symmetry
    • Reduces high-frequency noise while preserving mesoscale features
    • Computes smoothed values for raw dynamic height and its anomaly

Mathematical representations:

  • Interpolation: $\text{var}_{\text{reg}} = \text{interp}(\text{dist}_{\text{reg}}, \text{dist}_{\text{raw}}, \text{var}_{\text{raw}})$
  • Smoothing: $\text{var}_{\text{smooth}}[i] = \frac{1}{N} \sum_{j=-N/2}^{N/2} \text{var}_{\text{raw}}[i+j]$

The resulting DataFrame provides a consistently gridded and smoothed dataset, prepared for further oceanographic analysis.

Key aspects of the preprocessing:

  • Handles potential edge cases in window size
  • Uses robust interpolation and smoothing techniques
  • Maintains data integrity through careful handling of missing values
In [ ]:
# Part 3 – Regular-grid interpolation and smoothing
        step_km = 1      # horizontal sampling interval (km)
        window_size_km = 11  # smoothing window (km)
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        window_size = int(window_size_km / step_km)
        if window_size < 1:
            window_size = 1
        if window_size % 2 == 0:
            window_size += 1
        dh_reg_smooth = (pd.Series(dh_reg)
                         .rolling(window=window_size, center=True)
                         .mean()
                         .bfill()
                         .ffill()
                         .values)
        dhc_reg_smooth = (pd.Series(dhc_reg)
                          .rolling(window=window_size, center=True)
                          .mean()
                          .bfill()
                          .ffill()
                          .values)
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon': lon_reg,
            'lat': lat_reg,
            'dh': dh_reg_smooth,
            'dhc': dhc_reg_smooth
        })

Part 4: SWOT Co-location¶

The final part of the match-up process involves finding the corresponding SWOT satellite data for the MVP transect and spatially co-locating the measurements. This enables direct comparison between in-situ and satellite observations at coincident points.

The co-location process follows these steps:

  1. Temporal matching:

    • Extracts the date from MVP measurements
    • Identifies SWOT data files from the same day
    • Returns early if no matching SWOT data exists
  2. Spatial subsetting:

    • Defines a geographic domain for the Northwestern Mediterranean (3°-6.5°E, 38.5°-44°N)
    • Loads SWOT data within this domain using the loadsshuv function
    • Filters out invalid (NaN) data points
  3. Nearest-neighbor matching:

    • Creates a k-d tree spatial index from valid SWOT coordinates
    • For each MVP measurement point, finds the closest SWOT observation
    • Extracts the corresponding ADT values
  4. Centering and validation:

    • Computes the mean SWOT ADT and subtracts it to create centered anomalies
    • Identifies valid points (non-NaN in both datasets)
    • Returns the paired MVP and SWOT values for valid points only

The k-d tree approach provides an efficient spatial matching method that scales well with large datasets. This is important for SWOT data, which has much higher spatial resolution than traditional altimetry.

The function's final output consists of two arrays:

  • mvp_vals: Centered dynamic height anomalies from MVP
  • swot_vals: Centered ADT anomalies from SWOT

These paired values form the basis for our correlation analysis, allowing us to assess how well SWOT measurements agree with in-situ data for different reference pressure levels.

In [ ]:
# Part 4 – SWOT co-location
        if len(dates_mvp) == 0:
            return None, None
        day_str = dates_mvp[0].strftime('%Y-%m-%d')
        day_dt  = datetime.strptime(day_str, "%Y-%m-%d")
        domain = [3, 38.5, 6.5, 44]
        varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
                     'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
        swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
                               f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
        if len(swot_files) == 0:
            return None, None
        swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
        adt_swot_raw = swot['adt'].flatten()
        lon_swot = swot['lon'].flatten()
        lat_swot = swot['lat'].flatten()
        valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
        lon_swot_valid = lon_swot[valid_mask]
        lat_swot_valid = lat_swot[valid_mask]
        adt_swot_valid = adt_swot_raw[valid_mask]
        if len(lon_swot_valid) == 0:
            return None, None

        tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
        points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
        _, indices = tree.query(points_mvp, k=1)
        swot_adt_match = adt_swot_valid[indices]  # nearest-neighbour values

        swot_mean = np.nanmean(swot_adt_match)
        swot_adt_centered = swot_adt_match - swot_mean

        valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
        if np.sum(valid_pts) == 0:
            return None, None

        mvp_vals = df_mvp['dhc'].values[valid_pts]
        swot_vals = swot_adt_centered[valid_pts]
        return mvp_vals, swot_vals
    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        return None, None

3. Aggregated Sensitivity Analysis¶

3.1 Parameter Sweep¶

This section implements a systematic parameter sweep to find the optimal reference pressure level for dynamic height calculations. The ideal "level of no motion" should maximize the correlation between MVP-measured dynamic height and SWOT-observed absolute dynamic topography.

The analysis explores reference depths from 360 meters to 180 meters in 10-meter increments. For each candidate depth, the code:

  1. Computes match-ups for all transects:

    • Processes each profile (PL5 through PL24)
    • Calculates dynamic height using the candidate reference level
    • Collects valid MVP and SWOT value pairs
  2. Aggregates results across transects:

    • Combines data from all successful transects
    • Creates large combined datasets for robust statistical analysis
  3. Calculates correlation metrics:

    • Computes the Pearson correlation coefficient between MVP and SWOT measurements
    • Handles edge cases where standard deviations are too small
  4. Identifies optimal reference level:

    • Finds the candidate depth with the maximum correlation
    • Reports the best depth and corresponding correlation value

This approach has several advantages over analyzing individual transects:

  • It provides more robust statistics by leveraging data from all available profiles
  • It identifies a globally optimal reference level rather than transect-specific optima
  • It mitigates the impact of outliers or problematic individual transects

The Pearson correlation coefficient is an appropriate metric because:

$$r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \sum_{i=1}^{n} (y_i - \bar{y})^2}}$$

Where $x_i$ represents MVP measurements, $y_i$ represents SWOT measurements, and $\bar{x}$ and $\bar{y}$ are their respective means.

This metric quantifies the linear relationship between the two datasets, with values closer to 1 indicating stronger agreement. By selecting the reference depth that maximizes this correlation, we identify the level that produces the most consistent comparison between in-situ and satellite measurements.

In [ ]:
## 3.1 Parameter sweep
candidate_depths = np.arange(360, 180, -10)  # candidate reference levels (m)
agg_corr_values = []

for cand in candidate_depths:
    agg_mvp_list = []
    agg_swot_list = []
    for PL in np.arange(5, 25):
        mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=cand)
        if mvp_vals is not None and len(mvp_vals) > 0:
            agg_mvp_list.append(mvp_vals)
            agg_swot_list.append(swot_vals)
    if len(agg_mvp_list) == 0:
        agg_corr_values.append(np.nan)
    else:
        agg_mvp_all = np.concatenate(agg_mvp_list)
        agg_swot_all = np.concatenate(agg_swot_list)
        if np.std(agg_mvp_all) < 1e-12 or np.std(agg_swot_all) < 1e-12:
            agg_corr_values.append(np.nan)
        else:
            r, _ = pearsonr(agg_mvp_all, agg_swot_all)
            agg_corr_values.append(r)

agg_corr_values = np.array(agg_corr_values)
best_index = np.nanargmax(agg_corr_values)
best_cand_agg = candidate_depths[best_index]
print(f"Best candidate depth (p_ref) is {best_cand_agg} m with Pearson r = {agg_corr_values[best_index]:.3f}")

3.2 Visualization¶

The final step in our sensitivity analysis is to visualize how the correlation between MVP and SWOT measurements varies with different reference pressure levels. This graphical representation provides clear insight into the relationship between reference depth and measurement agreement.

The visualization consists of a line plot with markers showing the Pearson correlation coefficient for each candidate reference depth:

  • The x-axis represents the candidate reference depths from 360 to 180 meters
  • The y-axis shows the corresponding Pearson correlation values
  • Markers highlight each tested depth value for easy identification
  • A connecting line helps visualize the trend across the parameter space

This plot reveals several important features about the relationship between reference depth and MVP-SWOT agreement:

  1. Correlation trend: We can observe whether deeper or shallower reference levels generally produce better agreement
  2. Optimal depth: The peak of the curve indicates the optimal reference level
  3. Sensitivity: The slope indicates how sensitive the results are to changes in reference depth
  4. Stability: Plateau regions suggest ranges of depths that perform similarly well

From an oceanographic perspective, this analysis provides valuable insight into the vertical structure of water movement in the Northwestern Mediterranean. The optimal reference level indicates a depth at which horizontal motion is minimal or, at least, where the assumption of a "level of no motion" leads to dynamic height calculations that best match satellite observations.

By systematically investigating this parameter space, we establish a sound empirical basis for selecting the reference level to use in all subsequent analyses comparing MVP and satellite measurements, improving the overall validation methodology.

In [ ]:
## 3.2 Visualisation
plt.figure(figsize=(8, 6))
plt.plot(candidate_depths, agg_corr_values, 'o-', markersize=8)
plt.xlabel('Candidate Level of No Motion [m]')
plt.ylabel('Aggregated Pearson r')
plt.title('Aggregated Sensitivity Analysis (up to 350 m)')
plt.grid(True)
plt.tight_layout()
plt.show()

Integral Code and Plots (II)¶

In [8]:
# 1. IMPORT LIBRARIES
## 1.1 External and standard modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob, os
from datetime import datetime, timedelta
import gsw
import scipy.io as sio
import geopy.distance
from scipy.spatial import cKDTree
from scipy.stats import pearsonr
import xarray as xr
from scipy.interpolate import griddata
from tqdm import tqdm


# 2. AUXILIARY FUNCTIONS
## 2.1 Scalar and value extraction
def get_scalar(val):
    """Extract a scalar from a possibly nested iterable."""
    while isinstance(val, (list, tuple, np.ndarray)):
        if isinstance(val, np.ndarray):
            if val.size == 1:
                val = val.item()
            else:
                val = val[0]
        else:
            val = val[0]
    return val


def extract_values(nested_array):
    """Flatten a nested array and return the values as a list."""
    values = []

    def recurse(arr):
        for element in arr:
            if isinstance(element, np.ndarray):
                if element.dtype == 'O':
                    recurse(element)
                else:
                    values.append(element.item())
            elif isinstance(element, tuple):
                for item in element:
                    recurse(item)
            else:
                values.append(element)

    recurse(nested_array)
    return values


## 2.2 Data loading utilities
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load sea-surface height and velocity data from a netCDF file.
    """
    if 'type' in kwargs and kwargs['type'] == 'swot':
        filei2 = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei2)
        if 'area' in kwargs:
            area = kwargs['area']
            selection = ((ds.longitude > area[0]) &
                         (ds.longitude < area[2]) &
                         (ds.latitude > area[1]) &
                         (ds.latitude < area[3]))
            ds = ds.where(selection)
        lons = ds.longitude.to_masked_array()
        lats = ds.latitude.to_masked_array()
        us = ds.ugos.to_masked_array()
        vs = ds.vgos.to_masked_array()
        ssha = ds.ssha_filtered.to_masked_array()
        mss = ds.mss.to_masked_array()
        mdt = ds.mdt.to_masked_array()
        mask_row = np.all(np.isnan(us), axis=1)
        u = us[~mask_row, :]
        v = vs[~mask_row, :]
        lon = lons[~mask_row, :]
        lat = lats[~mask_row, :]
        ssha = ssha[~mask_row, :]
        mss = mss[~mask_row, :]
        mdt = mdt[~mask_row, :]
        adt = ssha + mdt
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon = ds[varn['longitude']][:]
        lat = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt = ds['adt'][:]
        u = ds[varn['u']][:]
        v = ds[varn['v']][:]
    return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
            'ssha': ssha, 'adt': adt}


## 2.3 Match-up computation
def calc_matchup_for_PL(PL, candidate=None):
    """
    Compute the MVP–SWOT dynamic-height match-up for a single profile.
    """
    try:
        # Part 1 – MVP processing and dynamic-height computation
        mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_raw = mvp_mat['metadata']['date']
        date_extract = extract_values(date_raw)
        dates_mvp = []
        for d_val in date_extract:
            scalar_val = get_scalar(d_val)
            dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
            dates_mvp.append(dt)

        mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mvp_mat2['lon']
        lat_mvp = mvp_mat2['lat']
        p_mvp   = mvp_mat2['pres']
        t_mvp   = mvp_mat2['temp']
        SP_mvp  = mvp_mat2['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81

        dh = []
        for i in range(SA.shape[0]):
            valid = ~np.isnan(SA[i])
            if np.sum(valid) == 0:
                dh.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
            if np.isnan(p_ref):
                dh.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
                                           p_mvp[i][valid], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh.append(dh_i)
        dh = np.array(dh)
        if np.all(np.isnan(dh)):
            return None, None
        dh_mean = np.nanmean(dh)
        dhc = dh - dh_mean  # centred anomaly

        # Part 2 – Transect construction and cumulative distance
        N = len(dhc)
        df_mvp_raw = pd.DataFrame({
            'dh': dh,
            'dhc': dhc,
            'lon': lon_mvp[0][:N],
            'lat': lat_mvp[0][:N]
        })
        dist = np.zeros(N)
        for i in range(1, N):
            dist[i] = dist[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
                (df_mvp_raw['lat'].iloc[i],   df_mvp_raw['lon'].iloc[i])
            ).km
        df_mvp_raw['dist'] = dist

        # Part 3 – Regular-grid interpolation and smoothing
        step_km = 1      # horizontal sampling interval (km)
        window_size_km = 11  # smoothing window (km)
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        window_size = int(window_size_km / step_km)
        if window_size < 1:
            window_size = 1
        if window_size % 2 == 0:
            window_size += 1
        dh_reg_smooth = (pd.Series(dh_reg)
                         .rolling(window=window_size, center=True)
                         .mean()
                         .bfill()
                         .ffill()
                         .values)
        dhc_reg_smooth = (pd.Series(dhc_reg)
                          .rolling(window=window_size, center=True)
                          .mean()
                          .bfill()
                          .ffill()
                          .values)
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon': lon_reg,
            'lat': lat_reg,
            'dh': dh_reg_smooth,
            'dhc': dhc_reg_smooth
        })

        # Part 4 – SWOT colocation
        if len(dates_mvp) == 0:
            return None, None
        day_str = dates_mvp[0].strftime('%Y-%m-%d')
        day_dt  = datetime.strptime(day_str, "%Y-%m-%d")
        domain = [3, 38.5, 6.5, 44]
        varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
                     'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
        swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
                               f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
        if len(swot_files) == 0:
            return None, None
        swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
        adt_swot_raw = swot['adt'].flatten()
        lon_swot = swot['lon'].flatten()
        lat_swot = swot['lat'].flatten()
        valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
        lon_swot_valid = lon_swot[valid_mask]
        lat_swot_valid = lat_swot[valid_mask]
        adt_swot_valid = adt_swot_raw[valid_mask]
        if len(lon_swot_valid) == 0:
            return None, None

        tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
        points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
        _, indices = tree.query(points_mvp, k=1)
        swot_adt_match = adt_swot_valid[indices]  # nearest-neighbour values

        swot_mean = np.nanmean(swot_adt_match)
        swot_adt_centered = swot_adt_match - swot_mean

        valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
        if np.sum(valid_pts) == 0:
            return None, None

        mvp_vals = df_mvp['dhc'].values[valid_pts]
        swot_vals = swot_adt_centered[valid_pts]
        return mvp_vals, swot_vals
    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        return None, None


# 3. AGGREGATED SENSITIVITY ANALYSIS
## 3.1 Parameter sweep
candidate_depths = np.arange(360, 180, -10)  # candidate reference levels (m)
agg_corr_values = []

for cand in candidate_depths:
    agg_mvp_list = []
    agg_swot_list = []
    for PL in np.arange(5, 25):
        mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=cand)
        if mvp_vals is not None and len(mvp_vals) > 0:
            agg_mvp_list.append(mvp_vals)
            agg_swot_list.append(swot_vals)
    if len(agg_mvp_list) == 0:
        agg_corr_values.append(np.nan)
    else:
        agg_mvp_all = np.concatenate(agg_mvp_list)
        agg_swot_all = np.concatenate(agg_swot_list)
        if np.std(agg_mvp_all) < 1e-12 or np.std(agg_swot_all) < 1e-12:
            agg_corr_values.append(np.nan)
        else:
            r, _ = pearsonr(agg_mvp_all, agg_swot_all)
            agg_corr_values.append(r)

agg_corr_values = np.array(agg_corr_values)
best_index = np.nanargmax(agg_corr_values)
best_cand_agg = candidate_depths[best_index]
print(f"Best candidate depth (p_ref) is {best_cand_agg} m with Pearson r = {agg_corr_values[best_index]:.3f}")

## 3.2 Visualisation
plt.figure(figsize=(8, 6))
plt.plot(candidate_depths, agg_corr_values, 'o-', markersize=8)
plt.xlabel('Candidate Level of No Motion [m]')
plt.ylabel('Aggregated Pearson r')
plt.title('Aggregated Sensitivity Analysis (up to 350 m)')
plt.grid(True)
plt.tight_layout()
plt.show()
Best candidate depth (p_ref) is 330 m with Pearson r = 0.618

3º Code - Statistical Validation of SWOT Altimetry: Comparative Analysis¶

This third section of the notebook provides a comprehensive statistical evaluation of SWOT data against in-situ measurements, with particular focus on quantifying the improvements achieved through reference level optimization.

While the previous section identified the optimal reference depth, this code implements a rigorous statistical framework to assess the practical significance of this optimization. Through scatterplots and numerical metrics, it provides a clear picture of how well SWOT captures the spatial variations in sea surface height measured by the MVP.

This statistical validation:

  1. Processes all MVP transects using both the default (maximum depth) and optimized reference levels
  2. Aggregates matched SWOT-MVP data pairs across the entire dataset
  3. Generates scatterplots comparing satellite and in-situ measurements for both reference level approaches
  4. Calculates key statistical metrics including RMSE, Pearson correlation, and regression slope
  5. Presents results in clear visualizations and tabular summaries

By comparing performance metrics between the default and optimized approaches, this analysis quantifies the practical benefits of reference level optimization. The results reveal not only whether SWOT and MVP measurements agree, but also how this agreement is influenced by methodological choices in data processing.

These statistical insights provide crucial guidance for future validation efforts and help establish best practices for comparing satellite altimetry with in-situ oceanographic measurements in the Mediterranean Sea and beyond.

1. Function Definitions¶

1.1 calc_matchup_for_PL¶

The calc_matchup_for_PL function serves as the core processing pipeline for our analysis, structured into four logical components that handle the complete workflow from raw data to matched measurement pairs.

This function takes two parameters:

  • PL: The profile (plongée) number to process
  • candidate: Optional reference pressure level (in dbar) for dynamic height calculations

The function performs comprehensive end-to-end processing to:

  1. Convert raw MVP measurements into dynamic height
  2. Organize data spatially along the transect
  3. Apply appropriate interpolation and smoothing
  4. Match with corresponding SWOT observations

This modular design allows us to systematically test different reference pressure levels and evaluate their impact on MVP-SWOT agreement. The function is robust against various error conditions, with appropriate handling for missing data, invalid measurements, and processing exceptions.

By serving as the foundation for both our parameter optimization and statistical validation, this function ensures consistent processing throughout our analysis pipeline, enhancing the reliability of our results.

In [ ]:
## 1.1 calc_matchup_for_PL

def calc_matchup_for_PL(PL, candidate=None):

1.1.1 MVP Data Loading and Dynamic Height Computation¶

This section implements the critical first step of our processing pipeline: loading MVP measurements and calculating dynamic height - the oceanographic variable most directly comparable to satellite altimetry.

The function first loads two complementary MATLAB files:

  • The first file contains metadata, particularly temporal information
  • The second file contains the actual oceanographic measurements (temperature, salinity, pressure)

After converting timestamps from MATLAB's date format to Python datetime objects, the code processes the raw measurements using the TEOS-10 standard:

  1. Converting to absolute properties:

    • Practical Salinity (SP) → Absolute Salinity (SA)
    • In-situ temperature → Conservative Temperature (CT)
  2. Dynamic height calculation: Dynamic height represents the vertical integration of density anomalies and is calculated profile-by-profile using:

    $$\Delta D = \frac{1}{g} \int_{p_{\text{ref}}}^{0} \delta(S,T,p) \, dp$$

    Where:

    • $\Delta D$ is the dynamic height
    • $g$ is gravitational acceleration (9.81 m/s²)
    • $\delta$ is the specific volume anomaly
    • $p_{\text{ref}}$ is the reference pressure level
  3. Reference level selection: The function implements a key feature: flexible reference level selection through the candidate parameter:

    p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
    

    This allows testing different reference depths during our sensitivity analysis.

  4. Centering the anomalies: Finally, the function computes centered anomalies by subtracting the mean:

    dhc = dh - np.nanmean(dh)
    

    This facilitates comparison with satellite data by removing systematic offsets.

The code includes robust error handling for missing or invalid data, ensuring that problematic profiles don't compromise the overall analysis.

In [ ]:
### 1.1.1 MVP data loading and dynamic-height computation
    try:
        mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_raw = mvp_mat['metadata']['date']
        date_extract = extract_values(date_raw)
        dates_mvp = []
        for d_val in date_extract:
            scalar_val = get_scalar(d_val)
            dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
            dates_mvp.append(dt)

        mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mvp_mat2['lon']
        lat_mvp = mvp_mat2['lat']
        p_mvp   = mvp_mat2['pres']
        t_mvp   = mvp_mat2['temp']
        SP_mvp  = mvp_mat2['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81

        dh = []
        for i in range(SA.shape[0]):
            valid = ~np.isnan(SA[i])
            if np.sum(valid) == 0:
                dh.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate  # reference level
            if np.isnan(p_ref):
                dh.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
                                           p_mvp[i][valid], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh.append(dh_i)
        dh = np.array(dh)
        if np.all(np.isnan(dh)):
            return None, None
        dhc = dh - np.nanmean(dh)  # centred anomaly

1.1.2 Transect Construction and Cumulative Distance¶

In this section, the code organizes the dynamic height data into a structured format and calculates along-track distances, creating a well-defined spatial reference system for the MVP transect.

The process consists of two main steps:

  1. Data organization: The code creates a pandas DataFrame containing:

    • Raw dynamic height (dh)
    • Centered dynamic height anomalies (dhc)
    • Geographic coordinates (longitude and latitude)

    This structured format facilitates subsequent processing and analysis.

  2. Cumulative distance calculation: For each point along the transect, the code calculates the cumulative distance from the starting point:

    dist[i] = dist[i-1] + geopy.distance.distance(
        (df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
        (df_mvp_raw['lat'].iloc[i],   df_mvp_raw['lon'].iloc[i])
    ).km
    

    The function uses geodesic distances that account for Earth's curvature, which is more accurate than simple Euclidean calculations, especially for longer transects.

The resulting distance array is added to the DataFrame, providing a natural coordinate system for the MVP measurements. This distance metric serves as the independent variable for subsequent interpolation and facilitates consistent spatial comparisons with satellite data.

By establishing this transect representation, we create a foundation for the regular-grid interpolation and smoothing operations that follow in the next section of the processing pipeline.

In [ ]:
### 1.1.2 Transect construction and cumulative distance
        N = len(dhc)
        df_mvp_raw = pd.DataFrame({
            'dh': dh,
            'dhc': dhc,
            'lon': lon_mvp[0][:N],
            'lat': lat_mvp[0][:N]
        })
        dist = np.zeros(N)
        for i in range(1, N):
            dist[i] = dist[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
                (df_mvp_raw['lat'].iloc[i],   df_mvp_raw['lon'].iloc[i])
            ).km
        df_mvp_raw['dist'] = dist

1.1.3 Regular-Grid Interpolation and Smoothing¶

This section performs critical spatial preprocessing to ensure consistent sampling and noise reduction in the MVP data. The process involves two key operations:

  1. Regular-grid interpolation: The code converts irregularly spaced measurements to a uniform 1-kilometer grid:

    dist_reg = np.arange(0, dist_max + step_km, step_km)
    lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
    lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
    

    This regular spacing is essential for:

    • Consistent comparison with satellite data
    • Robust statistical analysis
    • Effective application of smoothing filters
  2. Moving-window smoothing: To reduce high-frequency noise while preserving mesoscale features, the code applies an 11-kilometer centered moving average filter:

    dh_reg_smooth = (pd.Series(dh_reg)
                    .rolling(window=window_size, center=True)
                    .mean()
                    .bfill()
                    .ffill()
                    .values)
    

    The function ensures proper handling of edge effects through backward and forward filling of missing values.

The smoothing window size (11 km) represents a careful balance between:

  • Reducing noise that could obscure the correlation with satellite data
  • Preserving oceanographic features of interest (mesoscale eddies and fronts)
  • Aligning with typical satellite altimetry effective resolution

Finally, the processed data is organized into a new DataFrame with regular spacing and smoothed values, ready for comparison with satellite observations. This preprocessing step is crucial for fair comparison between in-situ and satellite measurements, as it accounts for their different inherent spatial resolutions and sampling patterns.

In [ ]:
### 1.1.3 Regular-grid interpolation and smoothing
        step_km = 1       # horizontal sampling interval
        window_size_km = 11
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        window_size = int(window_size_km / step_km)
        if window_size < 1:
            window_size = 1
        if window_size % 2 == 0:
            window_size += 1
        dh_reg_smooth = (pd.Series(dh_reg)
                         .rolling(window=window_size, center=True)
                         .mean()
                         .bfill()
                         .ffill()
                         .values)
        dhc_reg_smooth = (pd.Series(dhc_reg)
                          .rolling(window=window_size, center=True)
                          .mean()
                          .bfill()
                          .ffill()
                          .values)
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon': lon_reg,
            'lat': lat_reg,
            'dh': dh_reg_smooth,
            'dhc': dhc_reg_smooth
        })

1.1.4 SWOT Colocation and Output¶

The final part of the processing pipeline performs the critical task of finding and matching SWOT satellite observations to our processed MVP data points. This spatial and temporal alignment enables direct point-by-point comparison between the two measurement systems.

The process follows a systematic approach:

  1. Temporal matching: The code identifies SWOT data files from the same day as the MVP measurements:

    swot_files = glob.glob(f'...SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
    
  2. Spatial subsetting: It loads SWOT data within the Northwestern Mediterranean domain (3°-6.5°E, 38.5°-44°N) and extracts valid measurements:

    valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
    
  3. Nearest-neighbor matching: Using a k-d tree spatial index, the code efficiently finds the closest SWOT observation for each MVP point:

    tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
    _, indices = tree.query(points_mvp, k=1)
    

    This approach provides optimal computational performance, even with the large number of points in SWOT's wide-swath data.

  4. Data centering and filtering: The SWOT ADT values are centered by subtracting their mean, and only valid point pairs are retained:

    swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])
    valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
    

The function's output consists of two matched arrays containing:

  • Centered dynamic height from MVP measurements
  • Centered absolute dynamic topography from SWOT observations

These paired values form the foundation for our correlation analysis, enabling quantitative assessment of how well the satellite measurements agree with in-situ data, and how this agreement varies with different reference pressure levels.

In [ ]:
### 1.1.4 SWOT colocation and output
        if len(dates_mvp) == 0:
            return None, None
        day_dt = datetime.strptime(dates_mvp[0].strftime('%Y-%m-%d'), "%Y-%m-%d")
        domain = [3, 38.5, 6.5, 44]
        varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
                     'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
        swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
                               f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
        if len(swot_files) == 0:
            return None, None
        swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
        adt_swot_raw = swot['adt'].flatten()
        lon_swot = swot['lon'].flatten()
        lat_swot = swot['lat'].flatten()
        valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
        lon_swot_valid = lon_swot[valid_mask]
        lat_swot_valid = lat_swot[valid_mask]
        adt_swot_valid = adt_swot_raw[valid_mask]
        if len(lon_swot_valid) == 0:
            return None, None

        tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
        points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
        _, indices = tree.query(points_mvp, k=1)
        swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])

        valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
        if np.sum(valid_pts) == 0:
            return None, None

        return df_mvp['dhc'].values[valid_pts], swot_adt_centered[valid_pts]
    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        return None, None

2. Scatterplot and Statistical Analysis¶

2.1 Default Reference Level (p_max)¶

This section evaluates the agreement between MVP and SWOT measurements using the default reference level approach, where dynamic height is calculated relative to the maximum pressure (deepest measurement) for each profile.

The analysis consists of three main components:

  1. Data aggregation across all 20 transects (PL5-PL24), creating a comprehensive dataset for robust statistical analysis.

  2. Statistical parameter calculation to quantify the agreement between in-situ and satellite measurements:

    • Linear regression: Fits a model $y = mx + b$ to determine how well SWOT data predicts MVP measurements

    • Root Mean Square Error (RMSE): Quantifies the average magnitude of discrepancies: $$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$$

    • Pearson correlation coefficient: Measures the strength of the linear relationship: $$r = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2 \sum_{i=1}^{n}(y_i-\bar{y})^2}}$$

    • Standard deviations: Compare the variability in each dataset

  3. Visualization through a scatterplot showing:

    • Individual data points (MVP vs. SWOT measurements)
    • A 1:1 reference line (perfect agreement)
    • The fitted regression line
    • A table summarizing the statistical metrics

This baseline assessment using the traditional maximum-pressure reference level establishes a reference point for evaluating the impact of our optimized reference level. The deviation of the regression slope from 1.0 indicates systematic differences in the amplitude of measurements between the two systems, while the correlation coefficient quantifies the overall pattern agreement regardless of amplitude.

The statistical metrics from this analysis will be directly compared with those from the optimized reference level to demonstrate the practical significance of our parameter optimization approach.

In [ ]:
## 2.1 Default reference level (p_max)

agg_mvp_def = []
agg_swot_def = []
for PL in np.arange(5, 25):
    mvp_vals, swot_vals = calc_matchup_for_PL(PL)
    if mvp_vals is not None and len(mvp_vals) > 0:
        agg_mvp_def.append(mvp_vals)
        agg_swot_def.append(swot_vals)
agg_mvp_all_def = np.concatenate(agg_mvp_def) if len(agg_mvp_def) else np.array([])
agg_swot_all_def = np.concatenate(agg_swot_def) if len(agg_swot_def) else np.array([])

if len(agg_swot_all_def):
    slope_def, intercept_def = np.polyfit(agg_swot_all_def, agg_mvp_all_def, 1)
    x_fit_def = np.linspace(np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def), 100)
    y_fit_def = slope_def * x_fit_def + intercept_def
    rmse_def = np.sqrt(np.mean((agg_mvp_all_def - agg_swot_all_def) ** 2))
    r_def, _ = pearsonr(agg_mvp_all_def, agg_swot_all_def)
    std_mvp_def = np.std(agg_mvp_all_def, ddof=1)
    std_swot_def = np.std(agg_swot_all_def, ddof=1)
else:
    slope_def = intercept_def = rmse_def = r_def = std_mvp_def = std_swot_def = np.nan

plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_def, agg_mvp_all_def, alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)],
         [np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)], 'k--', label='1:1 line')
plt.plot(x_fit_def, y_fit_def, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_max)')
plt.title('Scatter Plot: p_ref = p_max')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_def = ax.table(
    cellText=[['RMSE', f'{rmse_def:.3f}'],
              ['Pearson r', f'{r_def:.3f}'],
              ['Slope', f'{slope_def:.3f}'],
              ['Std MVP', f'{std_mvp_def:.3f}'],
              ['Std SWOT', f'{std_swot_def:.3f}']],
    colLabels=['Statistic', 'Value'],
    loc='center'
)
table_def.auto_set_font_size(False)
table_def.set_fontsize(10)
ax.set_title('p_ref = p_max')
plt.show()

2.2 Optimal Reference Level (p_optimal)¶

This section applies our optimized reference level (best_cand_agg) determined from the previous sensitivity analysis to evaluate whether it improves the agreement between MVP and SWOT measurements. Using the optimal reference level rather than the traditional maximum pressure approach represents a key methodological advancement in satellite altimetry validation.

The analysis follows the same structure as the previous section, but now uses the optimized reference pressure for dynamic height calculations. This allows for direct comparison of results to quantify the improvement achieved through parameter optimization.

The scatterplot visualization uses green points to visually distinguish it from the default case, while maintaining the same statistical framework. The fitted regression line shows how the linear relationship between MVP and SWOT measurements changes when using the optimized reference level.

From an oceanographic perspective, the optimal reference level represents the depth at which the assumption of "no motion" produces dynamic height calculations that best match satellite observations. This has important implications for our understanding of the region's vertical structure:

  1. If the optimal depth is significantly different from typical maximum cast depths, it suggests that substantial motion exists at deeper levels

  2. The improvement in correlation (if any) quantifies how much the reference level choice affects our ability to validate satellite measurements

  3. Changes in the regression slope indicate whether the optimal reference level helps align the amplitude of variations between the two measurement systems

These statistical metrics provide concrete evidence for the practical significance of our reference level optimization. By demonstrating a quantitative improvement in satellite-in situ agreement, we establish a more robust methodology for SWOT validation that can be applied to other regions and campaigns.

In [ ]:
## 2.2 Optimal reference level (p_optimal)

agg_mvp_opt = []
agg_swot_opt = []
for PL in np.arange(5, 25):
    mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=best_cand_agg)
    if mvp_vals is not None and len(mvp_vals) > 0:
        agg_mvp_opt.append(mvp_vals)
        agg_swot_opt.append(swot_vals)
agg_mvp_all_opt = np.concatenate(agg_mvp_opt) if len(agg_mvp_opt) else np.array([])
agg_swot_all_opt = np.concatenate(agg_swot_opt) if len(agg_swot_opt) else np.array([])

if len(agg_swot_all_opt):
    slope_opt, intercept_opt = np.polyfit(agg_swot_all_opt, agg_mvp_all_opt, 1)
    x_fit_opt = np.linspace(np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt), 100)
    y_fit_opt = slope_opt * x_fit_opt + intercept_opt
    rmse_opt = np.sqrt(np.mean((agg_mvp_all_opt - agg_swot_all_opt) ** 2))
    r_opt, _ = pearsonr(agg_mvp_all_opt, agg_swot_all_opt)
    std_mvp_opt = np.std(agg_mvp_all_opt, ddof=1)
    std_swot_opt = np.std(agg_swot_all_opt, ddof=1)
else:
    slope_opt = intercept_opt = rmse_opt = r_opt = std_mvp_opt = std_swot_opt = np.nan

plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_opt, agg_mvp_all_opt, c='green', alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)],
         [np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)], 'k--', label='1:1 line')
plt.plot(x_fit_opt, y_fit_opt, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_optimal)')
plt.title('Scatter Plot: p_ref = p_optimal')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_opt = ax.table(
    cellText=[['RMSE', f'{rmse_opt:.3f}'],
              ['Pearson r', f'{r_opt:.3f}'],
              ['Slope', f'{slope_opt:.3f}'],
              ['Std MVP', f'{std_mvp_opt:.3f}'],
              ['Std SWOT', f'{std_swot_opt:.3f}']],
    colLabels=['Statistic', 'Value'],
    loc='center'
)
table_opt.auto_set_font_size(False)
table_opt.set_fontsize(10)
ax.set_title('p_ref = p_optimal')
plt.show()

Integral Code and Plots (III)¶

In [9]:
# 1. FUNCTION DEFINITIONS
## 1.1 calc_matchup_for_PL

def calc_matchup_for_PL(PL, candidate=None):
    # ### 1.1.1 MVP data loading and dynamic-height computation
    try:
        mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_raw = mvp_mat['metadata']['date']
        date_extract = extract_values(date_raw)
        dates_mvp = []
        for d_val in date_extract:
            scalar_val = get_scalar(d_val)
            dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
            dates_mvp.append(dt)

        mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mvp_mat2['lon']
        lat_mvp = mvp_mat2['lat']
        p_mvp   = mvp_mat2['pres']
        t_mvp   = mvp_mat2['temp']
        SP_mvp  = mvp_mat2['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81

        dh = []
        for i in range(SA.shape[0]):
            valid = ~np.isnan(SA[i])
            if np.sum(valid) == 0:
                dh.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate  # reference level
            if np.isnan(p_ref):
                dh.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
                                           p_mvp[i][valid], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh.append(dh_i)
        dh = np.array(dh)
        if np.all(np.isnan(dh)):
            return None, None
        dhc = dh - np.nanmean(dh)  # centred anomaly

        # ### 1.1.2 Transect construction and cumulative distance
        N = len(dhc)
        df_mvp_raw = pd.DataFrame({
            'dh': dh,
            'dhc': dhc,
            'lon': lon_mvp[0][:N],
            'lat': lat_mvp[0][:N]
        })
        dist = np.zeros(N)
        for i in range(1, N):
            dist[i] = dist[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
                (df_mvp_raw['lat'].iloc[i],   df_mvp_raw['lon'].iloc[i])
            ).km
        df_mvp_raw['dist'] = dist

        # ### 1.1.3 Regular-grid interpolation and smoothing
        step_km = 1       # horizontal sampling interval
        window_size_km = 11
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        window_size = int(window_size_km / step_km)
        if window_size < 1:
            window_size = 1
        if window_size % 2 == 0:
            window_size += 1
        dh_reg_smooth = (pd.Series(dh_reg)
                         .rolling(window=window_size, center=True)
                         .mean()
                         .bfill()
                         .ffill()
                         .values)
        dhc_reg_smooth = (pd.Series(dhc_reg)
                          .rolling(window=window_size, center=True)
                          .mean()
                          .bfill()
                          .ffill()
                          .values)
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon': lon_reg,
            'lat': lat_reg,
            'dh': dh_reg_smooth,
            'dhc': dhc_reg_smooth
        })

        # ### 1.1.4 SWOT colocation and output
        if len(dates_mvp) == 0:
            return None, None
        day_dt = datetime.strptime(dates_mvp[0].strftime('%Y-%m-%d'), "%Y-%m-%d")
        domain = [3, 38.5, 6.5, 44]
        varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
                     'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
        swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
                               f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
        if len(swot_files) == 0:
            return None, None
        swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
        adt_swot_raw = swot['adt'].flatten()
        lon_swot = swot['lon'].flatten()
        lat_swot = swot['lat'].flatten()
        valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
        lon_swot_valid = lon_swot[valid_mask]
        lat_swot_valid = lat_swot[valid_mask]
        adt_swot_valid = adt_swot_raw[valid_mask]
        if len(lon_swot_valid) == 0:
            return None, None

        tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
        points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
        _, indices = tree.query(points_mvp, k=1)
        swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])

        valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
        if np.sum(valid_pts) == 0:
            return None, None

        return df_mvp['dhc'].values[valid_pts], swot_adt_centered[valid_pts]
    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        return None, None


# 2. SCATTERPLOT AND STATISTICS
## 2.1 Default reference level (p_max)

agg_mvp_def = []
agg_swot_def = []
for PL in np.arange(5, 25):
    mvp_vals, swot_vals = calc_matchup_for_PL(PL)
    if mvp_vals is not None and len(mvp_vals) > 0:
        agg_mvp_def.append(mvp_vals)
        agg_swot_def.append(swot_vals)
agg_mvp_all_def = np.concatenate(agg_mvp_def) if len(agg_mvp_def) else np.array([])
agg_swot_all_def = np.concatenate(agg_swot_def) if len(agg_swot_def) else np.array([])

if len(agg_swot_all_def):
    slope_def, intercept_def = np.polyfit(agg_swot_all_def, agg_mvp_all_def, 1)
    x_fit_def = np.linspace(np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def), 100)
    y_fit_def = slope_def * x_fit_def + intercept_def
    rmse_def = np.sqrt(np.mean((agg_mvp_all_def - agg_swot_all_def) ** 2))
    r_def, _ = pearsonr(agg_mvp_all_def, agg_swot_all_def)
    std_mvp_def = np.std(agg_mvp_all_def, ddof=1)
    std_swot_def = np.std(agg_swot_all_def, ddof=1)
else:
    slope_def = intercept_def = rmse_def = r_def = std_mvp_def = std_swot_def = np.nan

plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_def, agg_mvp_all_def, alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)],
         [np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)], 'k--', label='1:1 line')
plt.plot(x_fit_def, y_fit_def, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_max)')
plt.title('Scatter Plot: p_ref = p_max')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_def = ax.table(
    cellText=[['RMSE', f'{rmse_def:.3f}'],
              ['Pearson r', f'{r_def:.3f}'],
              ['Slope', f'{slope_def:.3f}'],
              ['Std MVP', f'{std_mvp_def:.3f}'],
              ['Std SWOT', f'{std_swot_def:.3f}']],
    colLabels=['Statistic', 'Value'],
    loc='center'
)
table_def.auto_set_font_size(False)
table_def.set_fontsize(10)
ax.set_title('p_ref = p_max')
plt.show()


## 2.2 Optimal reference level (p_optimal)

agg_mvp_opt = []
agg_swot_opt = []
for PL in np.arange(5, 25):
    mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=best_cand_agg)
    if mvp_vals is not None and len(mvp_vals) > 0:
        agg_mvp_opt.append(mvp_vals)
        agg_swot_opt.append(swot_vals)
agg_mvp_all_opt = np.concatenate(agg_mvp_opt) if len(agg_mvp_opt) else np.array([])
agg_swot_all_opt = np.concatenate(agg_swot_opt) if len(agg_swot_opt) else np.array([])

if len(agg_swot_all_opt):
    slope_opt, intercept_opt = np.polyfit(agg_swot_all_opt, agg_mvp_all_opt, 1)
    x_fit_opt = np.linspace(np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt), 100)
    y_fit_opt = slope_opt * x_fit_opt + intercept_opt
    rmse_opt = np.sqrt(np.mean((agg_mvp_all_opt - agg_swot_all_opt) ** 2))
    r_opt, _ = pearsonr(agg_mvp_all_opt, agg_swot_all_opt)
    std_mvp_opt = np.std(agg_mvp_all_opt, ddof=1)
    std_swot_opt = np.std(agg_swot_all_opt, ddof=1)
else:
    slope_opt = intercept_opt = rmse_opt = r_opt = std_mvp_opt = std_swot_opt = np.nan

plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_opt, agg_mvp_all_opt, c='green', alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)],
         [np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)], 'k--', label='1:1 line')
plt.plot(x_fit_opt, y_fit_opt, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_optimal)')
plt.title('Scatter Plot: p_ref = p_optimal')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_opt = ax.table(
    cellText=[['RMSE', f'{rmse_opt:.3f}'],
              ['Pearson r', f'{r_opt:.3f}'],
              ['Slope', f'{slope_opt:.3f}'],
              ['Std MVP', f'{std_mvp_opt:.3f}'],
              ['Std SWOT', f'{std_swot_opt:.3f}']],
    colLabels=['Statistic', 'Value'],
    loc='center'
)
table_opt.auto_set_font_size(False)
table_opt.set_fontsize(10)
ax.set_title('p_ref = p_optimal')
plt.show()

4º Code - Spectral Analysis of Ocean Dynamics: Cross-Validation Framework¶

This fourth section of the notebook implements a sophisticated spectral analysis framework to evaluate SWOT data across different spatial scales. By examining power spectral density and coherence, it provides unique insight into SWOT's ability to resolve oceanographic features of different sizes.

Ocean dynamics operate across a continuum of spatial scales, from basin-wide circulations to small-scale turbulence. Understanding how well satellite measurements capture this multi-scale variability is essential for oceanographic applications. This code implements a comprehensive spectral analysis approach that goes beyond simple point-to-point comparisons.

This spectral framework:

  1. Processes MVP, SWOT, and DUACS data to extract continuous segments suitable for spectral analysis
  2. Computes power spectral density (PSD) using Welch's method with appropriate segmentation and windowing
  3. Calculates spectral coherence between different measurement systems
  4. Performs ensemble averaging across all transects to obtain robust spectral estimates
  5. Creates detailed visualizations showing energy distribution and coherence across spatial scales

By comparing spectral characteristics across three measurement systems (MVP, SWOT, and DUACS), this analysis reveals SWOT's strengths and limitations in capturing different scales of oceanographic variability. The coherence analysis quantifies the scale-dependent agreement between satellite and in-situ measurements, providing a rigorous assessment of SWOT's effective resolution.

These spectral insights are particularly valuable for understanding SWOT's performance in the mesoscale-to-submesoscale transition region, where traditional altimetry has struggled to resolve important oceanographic features that play crucial roles in energy transfer and biogeochemical processes.

1. Import Libraries¶

1.1 Core Scientific Stack¶

This section imports the essential libraries needed for our spectral analysis workflow:

  • numpy and pandas: Fundamental numerical and data analysis tools
  • matplotlib.pyplot: Visualization library for creating complex multi-panel plots
  • scipy.signal: Contains spectral analysis functions including Welch's method for PSD estimation and coherence calculations
  • gsw: The Gibbs SeaWater toolbox for oceanographic calculations following TEOS-10
  • scipy.io and xarray: Data input/output libraries for MATLAB files and NetCDF formats
  • geopy.distance: For calculating geodesic distances along MVP transects
  • scipy.interpolate: Provides griddata for spatial interpolation
  • tqdm: Progress monitoring for the transect processing loop
  • warnings: Used to suppress runtime warnings that might occur during spectral computations

The code also disables RuntimeWarnings, which commonly occur during spectral calculations with potentially NaN-containing data. This suppression helps maintain clean output while processing multiple transects.

This spectral analysis pipeline builds on our previous validation frameworks, but now focuses specifically on frequency-domain characteristics of the data. By examining how energy is distributed across different spatial scales, we can better understand the strengths and limitations of SWOT in capturing ocean variability.

In [ ]:
## 1.1 Core scientific stack
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.signal as signal
import warnings
import gsw
import scipy.io as sio
import geopy.distance
from datetime import datetime, timedelta
import glob, os
from scipy.interpolate import griddata
from tqdm import tqdm
import xarray as xr

warnings.filterwarnings("ignore", category=RuntimeWarning)

2. Configuration Parameters¶

The configuration section defines key threshold values that control the spectral analysis process. These parameters help ensure the statistical robustness of our results by filtering out segments that are too short to yield reliable spectral estimates.

MIN_SEGMENT_LENGTH = 10 establishes the minimum number of consecutive valid data points required to retain a segment for spectral analysis. This threshold is important for several reasons:

  1. Statistical significance: Reliable spectral estimation requires a sufficient number of samples. Segments that are too short would produce noisy, unreliable spectra that could skew our ensemble results.

  2. Frequency resolution: The minimum resolvable frequency is inversely proportional to the segment length. With 1-km sampling, a 10-point segment allows us to resolve wavelengths up to 10 km, which is appropriate for capturing submesoscale features.

  3. Computational efficiency: Discarding very short segments that would contribute minimal information improves processing efficiency.

This parameter directly affects the segmentation process that occurs later in the code, where continuous portions of data are identified between gaps (NaN values) and evaluated for inclusion in the spectral analysis. By requiring at least 10 valid points, we ensure that each segment can meaningfully contribute to our understanding of the spectral characteristics at scales relevant to SWOT validation.

In [ ]:
# 2. CONFIGURATION PARAMETERS
MIN_SEGMENT_LENGTH = 10   # minimum points to retain a segment

3. Date Utilities¶

3.1 MATLAB Ordinal-Date Conversion¶

The read_mat_datetime function solves a critical interoperability challenge: converting MATLAB's date representation to Python's datetime format. This conversion is essential for temporal alignment between MVP measurements and satellite passes.

MATLAB represents dates as days since January 0, 0000, while Python's datetime uses a different epoch. This discrepancy requires careful handling, especially for oceanographic data where precise temporal matching is crucial.

The function implements a robust approach to extract datetime information from potentially nested MATLAB structures:

  1. Flattening the array to handle multi-dimensional date arrays
  2. Recursive extraction to handle nested containers (lists, tuples, arrays)
  3. Splitting integer and fractional parts to process both the date and time components
  4. Applying the correct offset (-366 days) to account for the difference between MATLAB's and Python's date origins

The implementation is particularly robust against common issues in MATLAB-Python conversion:

  • It safely handles empty or malformed values
  • It extracts dates from nested structures regardless of nesting depth
  • It properly converts the fractional part to hours, minutes, and seconds

This utility function ensures accurate temporal alignment between in-situ measurements and satellite observations, which is fundamental for meaningful spectral comparison. Without precise temporal matching, we could inadvertently compare measurements representing different oceanographic conditions, leading to erroneous conclusions about SWOT's performance.

In [ ]:
## 3.1 MATLAB ordinal‐date conversion
def read_mat_datetime(date_array):
    """
    Convert MATLAB ordinal dates in 'date_array' to Python datetime objects.
    """
    dates_list = []
    array_flat = date_array.reshape(-1)
    for val_ in array_flat:
        vv = val_
        while isinstance(vv, (list, tuple, np.ndarray)):
            if isinstance(vv, np.ndarray):
                vv = vv.item() if vv.size == 1 else vv.ravel()[0]
            elif isinstance(vv, tuple):
                vv = vv[0] if vv else None
        if isinstance(vv, (int, float)):
            frac = vv % 1
            try:
                dt_ = datetime.fromordinal(int(vv)) + timedelta(days=frac) - timedelta(days=366)
                dates_list.append(dt_)
            except:
                pass
    return dates_list

4. Data Loading Functions¶

4.1 Generic SSH/U/V Reader¶

The loadsshuv function provides a unified interface for loading satellite altimetry data from various sources, handling the specific format differences between SWOT and traditional altimetry products.

This utility addresses a common challenge in multi-sensor validation: different satellite data products follow different conventions, variable names, and storage formats. By abstracting these differences into a single function, we ensure consistent processing across all data sources.

The function processes two main data types:

  1. SWOT data (kwargs['type'] == 'swot'):

    • Applies spatial filtering based on the specified geographic area
    • Handles SWOT's specific data structure and naming conventions
    • Removes rows containing only NaN values for greater processing efficiency
    • Computes Absolute Dynamic Topography (ADT) from Sea Surface Height Anomaly (SSHA) and Mean Dynamic Topography (MDT)
  2. Standard altimetry products (e.g., DUACS):

    • Uses the variable name mapping provided in the varn dictionary
    • Extracts standard oceanographic variables directly from the dataset

For both data types, the function returns a standardized dictionary containing key oceanographic variables:

  • Geographic coordinates (lon, lat)
  • Sea surface height components (ssh, ssha)
  • Geostrophic velocity components (u, v)
  • Absolute Dynamic Topography (adt)

This consistency in output format is critical for spectral analysis, as it ensures that we're comparing equivalent quantities across different measurement platforms, facilitating direct comparison of their spectral characteristics.

In [ ]:
## 4.1 Generic SSH/U/V reader
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load longitude, latitude, SSH, velocity, and ADT fields from a NetCDF file.
    """
    if 'type' in kwargs and kwargs['type'] == 'swot':
        filei2 = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei2)
        if 'area' in kwargs:
            area = kwargs['area']
            selection = ((ds.longitude > area[0]) &
                         (ds.longitude < area[2]) &
                         (ds.latitude > area[1]) &
                         (ds.latitude < area[3]))
            ds = ds.where(selection)
        lons = ds.longitude.to_masked_array()
        lats = ds.latitude.to_masked_array()
        us   = ds.ugos.to_masked_array()
        vs   = ds.vgos.to_masked_array()
        ssha = ds.ssha_filtered.to_masked_array()
        mss  = ds.mss.to_masked_array()
        mdt  = ds.mdt.to_masked_array()
        mask_row = np.all(np.isnan(us), axis=1)
        u    = us[~mask_row, :]
        v    = vs[~mask_row, :]
        lon  = lons[~mask_row, :]
        lat  = lats[~mask_row, :]
        ssha = ssha[~mask_row, :]
        mss  = mss[~mask_row, :]
        mdt  = mdt[~mask_row, :]
        adt = ssha + mdt
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon  = ds[varn['longitude']][:]
        lat  = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt  = ds['adt'][:]
        u    = ds[varn['u']][:]
        v    = ds[varn['v']][:]
    return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
            'ssha': ssha, 'adt': adt}

4.2 SWOT Colocation¶

This function addresses one of the core challenges in satellite validation: matching SWOT observations with MVP in-situ measurements in both space and time. Proper colocation is essential for meaningful spectral comparison, as we need to ensure that both measurement systems are observing the same oceanographic features.

The load_SWOT function performs this matching through the following steps:

  1. Geographic domain determination:

    domain = [np.floor(np.min(mvp['lon'])) - 0.5,
              np.floor(np.min(mvp['lat'])) - 0.5,
              np.ceil(np.max(mvp['lon'])) + 0.5,
              np.ceil(np.max(mvp['lat'])) + 0.5]
    

    This creates a rectangular bounding box around the MVP transect with a 0.5° buffer in all directions, ensuring we capture all relevant SWOT data while minimizing computational load.

  2. Temporal matching: The function identifies SWOT files corresponding to the specified date, ensuring temporal agreement between the datasets.

  3. Spatial interpolation: Using scipy.interpolate.griddata, the function interpolates SWOT ADT values onto the exact MVP sampling locations:

    adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
    

This spatial matching is particularly important for spectral analysis because:

  • Both datasets must sample the same physical space to capture identical oceanographic features
  • The interpolation to MVP points ensures that both signals have identical spatial sampling, which is necessary for direct spectral comparison
  • Linear interpolation preserves the spectral characteristics of the satellite data while providing values at the required locations

The resulting interpolated SWOT ADT values are directly comparable to MVP dynamic height measurements, enabling the subsequent spectral analysis to evaluate how well SWOT captures oceanographic variability across different spatial scales.

In [ ]:
## 4.2 SWOT colocation
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
    """
    Load SWOT data and interpolate ADT onto MVP points.
    """
    domain = [np.floor(np.min(mvp['lon'])) - 0.5,
              np.floor(np.min(mvp['lat'])) - 0.5,
              np.ceil(np.max(mvp['lon'])) + 0.5,
              np.ceil(np.max(mvp['lat'])) + 0.5]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
    fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
                      dayv_dt.strftime('%Y%m%d') + '*_*.nc')
    if not fname:
        print(f"No SWOT file found for date {dayv}")
        return None
    swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
    points = np.column_stack((swot['lon'].flatten(), swot['lat'].flatten()))
    adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtc)

4.3 DUACS Colocation¶

The load_DUACS function complements the SWOT colocation by performing similar spatial-temporal matching for DUACS (Data Unification and Altimeter Combination System) altimetry products. DUACS represents the current operational standard for multi-mission satellite altimetry, making it an important reference point for evaluating SWOT's performance.

Like its SWOT counterpart, this function performs three key operations:

  1. Temporal matching: The function identifies DUACS files corresponding to the specified date using their standard naming convention: nrt_europe_allsat_phy_l4_YYYYMMDD_*.nc, where "nrt" stands for Near-Real-Time products.

  2. Data loading: DUACS data is loaded with appropriate variable mapping, noting that DUACS uses different variable names than SWOT (e.g., 'sla' instead of 'ssha_filtered').

  3. Spatial interpolation: The function handles DUACS' regular latitude-longitude grid by first creating a meshgrid before interpolation:

    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    

    Then it performs linear interpolation to the MVP points similarly to the SWOT colocation.

Including DUACS in our analysis serves several important purposes:

  • It provides a benchmark against which to evaluate SWOT's performance
  • It helps distinguish issues specific to SWOT from limitations common to all satellite altimetry
  • It allows us to assess whether SWOT's innovative wide-swath technology offers spectral improvements over traditional nadir altimetry

By colocating both SWOT and DUACS data onto the same MVP sampling points, we create a three-way comparison framework that will reveal how the spectral characteristics differ across measurement platforms.

In [ ]:
## 4.3 DUACS colocation
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
    """
    Load DUACS data and interpolate ADT onto MVP points.
    """
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
    fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
                      dayv_dt.strftime('%Y%m%d') + '_*.nc')
    if not fname:
        print(f"No DUACS file found for date {dayv}")
        return None
    duacs = loadsshuv(fname[0], '', varn)
    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    points = np.column_stack((X.flatten(), Y.flatten()))
    adtd = griddata(points, duacs['adt'].values.flatten(), (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtd)

5. Spectral Analysis Utilities¶

5.1 Segmentation and Power Spectral Density¶

5.1.1 NaN-Aware Splitting¶

The split_on_nan function addresses a critical challenge in oceanographic spectral analysis: handling missing or invalid data. Unlike controlled laboratory settings, oceanographic measurements often contain gaps or regions of invalid data due to sensor limitations, atmospheric interference, or coastal contamination.

This function implements a sophisticated algorithm to identify continuous valid segments within a potentially fragmented dataset:

  1. It scans the array sequentially, tracking consecutive NaN values
  2. When a run of max_consecutive_nans or more NaNs is detected, it marks the end of the current segment
  3. It extracts the valid portion of data between the start and end indices
  4. It then continues scanning from the current position to identify additional segments

The tolerance parameter max_consecutive_nans (default = 4) provides flexibility in handling isolated missing values:

  • A small value (e.g., 1-2) creates more segments but may discard useful data
  • A larger value tolerates more gaps but risks introducing artifacts into spectral calculations
  • The default of 4 represents a balance that preserves segment integrity while maximizing data usage

This segmentation approach is particularly important for spectral analysis because Fourier-based methods require continuous data without gaps. By identifying valid segments, we can compute accurate spectral estimates for each segment independently and then combine them to form ensemble averages.

The resulting segments form the foundation for our subsequent spectral calculations, ensuring that the power spectral density estimates are based on statistically sound data sequences without the contamination that would result from improperly handled missing values.

In [ ]:
### 5.1.1 NaN‐aware splitting
def split_on_nan(data_array, max_consecutive_nans=4):
    """
    Split a 1-D array into continuous segments separated by ≥ max_consecutive_nans NaNs.
    """
    segments = []
    n = len(data_array)
    start_idx = 0
    nan_count = 0
    for i in range(n):
        nan_count = nan_count + 1 if np.isnan(data_array[i]) else 0
        if nan_count >= max_consecutive_nans:
            end_idx = i - max_consecutive_nans
            if end_idx > start_idx:
                segm = data_array[start_idx:end_idx]
                if len(segm) > 1:
                    segments.append(segm)
            start_idx = i
            nan_count = 0
    if start_idx < n:
        segm = data_array[start_idx:]
        if len(segm) > 1:
            segments.append(segm)
    return segments

5.1.2 Welch PSD¶

The welch_psd_1D function implements Welch's method for power spectral density (PSD) estimation, a fundamental technique in spectral analysis that improves upon the basic periodogram by reducing noise while maintaining good frequency resolution.

Welch's method, developed by Peter Welch in 1967, involves:

  1. Segmentation: Dividing the time series into overlapping segments
  2. Windowing: Applying a window function (in this case, a Hann window) to each segment to reduce spectral leakage
  3. FFT computation: Calculating the periodogram for each windowed segment
  4. Averaging: Computing the mean of these periodograms to obtain a smoother estimate

Mathematically, the Welch PSD estimate is given by:

$$P_{xx}^{Welch}(f) = \frac{1}{K} \sum_{i=0}^{K-1} \left| \frac{1}{L} \sum_{n=0}^{L-1} w(n) x_i(n) e^{-j2\pi fn} \right|^2$$

Where:

  • $x_i(n)$ is the $i$-th segment of the signal
  • $w(n)$ is the window function (Hann window)
  • $L$ is the segment length
  • $K$ is the number of segments

The implementation in our function includes several key parameters:

  • nperseg = min(int(n // 2), 256): Sets the length of each segment, balancing between frequency resolution (larger values) and estimate variance (smaller values)
  • noverlap = nperseg // 2: Sets 50% overlap between consecutive segments, which helps recover information lost at the edges due to windowing
  • window = 'hann': Uses the Hann window, which offers a good compromise between spectral leakage reduction and frequency resolution
  • detrend = 'constant': Removes the mean from each segment to prevent spectral leakage at low frequencies

The function returns:

  • freq: An array of frequency points in cycles per sample unit
  • psd: The corresponding power spectral density estimates

The minimum length requirement of 8 points ensures sufficient data for meaningful FFT computation, preventing potential issues with very short segments that might arise from the segmentation process.

This implementation of Welch's method provides a robust foundation for analyzing the spectral characteristics of our oceanographic data, enabling the identification of dominant spatial scales and the comparison of spectral signatures across different measurement platforms.

In [ ]:
### 5.1.2 Welch PSD
def welch_psd_1D(data_array, fs=1.0):
    """
    Compute 1-D PSD via Welch’s method (Hann window).
    """
    n = len(data_array)
    if n < 8:
        return None, None
    nperseg = min(int(n // 2), 256)
    freq, psd = signal.welch(data_array, fs=fs, nperseg=nperseg,
                             noverlap=nperseg // 2, window='hann',
                             detrend='constant')
    return freq, psd

5.1.3 PSD for Segmented Data¶

The compute_segments_psd function extends our spectral analysis capability to handle real-world oceanographic data with gaps. It integrates the NaN-aware segmentation with Welch's PSD estimation to generate spectral estimates for multiple continuous segments within a potentially fragmented dataset.

This function operates through a multi-step process:

  1. Segmentation: First, it calls split_on_nan to divide the input array into continuous segments separated by runs of NaN values:

    segments = split_on_nan(data_array, max_nan)
    

    Where max_nan (default: 4) determines the tolerance for isolated missing values.

  2. Filtering: For each segment, it performs additional cleaning and validation:

    seg_clean = seg[~np.isnan(seg)]
    if len(seg_clean) < min_length:
        continue
    

    This ensures that any remaining NaN values are removed, and segments shorter than min_length (default: 10) are discarded as they would not provide statistically reliable spectral estimates.

  3. PSD Computation: For each valid segment, it computes the power spectral density using Welch's method:

    f, p = welch_psd_1D(seg_clean, fs)
    

    Where fs is the sampling frequency (default: 1.0, appropriate for spatial data with 1 km sampling).

  4. Result Collection: Valid frequency-PSD pairs are collected for subsequent ensemble averaging:

    results.append((f, p))
    

This approach has several advantages for oceanographic spectral analysis:

  • It maximizes data utilization by processing each valid segment independently
  • It maintains spectral integrity by excluding segments that are too short or contain too many gaps
  • It provides a collection of spectral estimates that can later be combined through ensemble averaging

The parameter min_length=10 is particularly important as it ensures that each segment contains sufficient points for meaningful spectral estimation. With 1 km sampling, this corresponds to a minimum spatial extent of 10 km, which is appropriate for resolving submesoscale features while ensuring statistical robustness.

This segmented approach is essential for analyzing MVP and satellite altimetry data along transects that may contain data gaps due to sensor issues, land contamination, or other operational constraints.

In [ ]:
### 5.1.3 PSD for segmented data
def compute_segments_psd(data_array, max_nan=4, fs=1.0, min_length=10):
    """
    Compute Welch PSD for each sufficiently long segment of 'data_array'.
    """
    segments = split_on_nan(data_array, max_nan)
    results = []
    for seg in segments:
        seg_clean = seg[~np.isnan(seg)]
        if len(seg_clean) < min_length:
            continue
        f, p = welch_psd_1D(seg_clean, fs)
        if f is not None and p is not None:
            results.append((f, p))
    return results

5.1.4 Log–Log Regression of PSD¶

The loglog_regression function quantifies a fundamental characteristic of oceanographic spectra: their power-law scaling behavior across different spatial scales. This scaling is crucial for understanding the energy cascade in ocean dynamics and validating satellite observations against theoretical predictions.

Turbulent flows in geophysical fluid dynamics often exhibit power-law behavior in their energy spectra, described by:

$$E(k) \propto k^m$$

Where:

  • $E(k)$ is the energy at wavenumber $k$
  • $m$ is the spectral slope

In log-log space, this power-law relationship becomes linear:

$$\log(E(k)) = m\log(k) + c$$

The function implements this log-log regression by:

  1. Filtering valid data points: It selects only points where both frequency and PSD are positive non-NaN values:

    valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
    

    This filtering is necessary because logarithms are only defined for positive values.

  2. Checking data sufficiency: It requires at least two valid points to perform a meaningful regression:

    if np.sum(valid) < 2:
        return np.nan, np.nan
    
  3. Performing linear regression in log-log space:

    m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
    

    This returns the slope m and intercept c of the best-fit line in log-log space.

In oceanographic contexts, the spectral slope has specific physical interpretations:

  • A slope of -5/3 (-1.67) indicates Kolmogorov-type turbulence, typical of three-dimensional turbulent flow
  • A slope of -3 is associated with quasi-geostrophic turbulence in the forward energy cascade
  • A slope of -2 often appears in surface quasi-geostrophic dynamics

By quantifying these slopes for different datasets (MVP, SWOT, DUACS), we can assess:

  • How well satellite measurements capture the expected energy scaling
  • Whether there are systematic differences in the energy distribution across scales
  • If there are characteristic scales where the spectral behavior changes

This analysis helps validate whether SWOT accurately reproduces the physical scaling laws observed in direct oceanographic measurements and expected from theoretical models.

In [ ]:
### 5.1.4 Log–log regression of PSD
def loglog_regression(freq, psd):
    """
    Fit log10(PSD) = m log10(freq) + c and return slope m, intercept c.
    """
    valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
    if np.sum(valid) < 2:
        return np.nan, np.nan
    m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
    return m, c

5.2 Spectral Coherence¶

5.2.1 Paired NaN-Aware Splitting¶

The split_on_nan_pair function extends our segmentation approach to handle paired datasets, which is essential for coherence analysis between MVP and satellite measurements. Unlike the single-array version, this function identifies continuous segments where both datasets have valid values simultaneously.

This synchronized segmentation is critical because coherence analysis requires corresponding measurements from both sources. The function introduces several key innovations:

  1. Combined NaN masking: The function creates a unified mask that identifies points where data is missing in either array:

    combined_nan = np.isnan(data1) | np.isnan(data2)
    

    This ensures that segments only include points where both datasets have valid measurements.

  2. Synchronized extraction: When a valid segment is identified, the function extracts corresponding portions from both input arrays and stores them as pairs:

    segments.append((segm1, segm2))
    
  3. Consistent gap tolerance: The max_consecutive_nans parameter (default: 4) applies the same tolerance for gaps to both datasets, maintaining a consistent approach to segmentation.

This paired segmentation strategy has important implications for our coherence analysis:

  • It ensures that spectral coherence is computed only for regions where both datasets have sufficient valid measurements
  • It maintains the spatial alignment between MVP and satellite data, which is essential for meaningful coherence estimation
  • It handles the practical reality that gaps may occur at different locations in each dataset due to their independent measurement systems

By enforcing this strict paired segmentation, we create a solid foundation for robust coherence analysis that accurately reflects the relationship between in-situ and satellite measurements across different spatial scales.

In [ ]:
### 5.2.1 Paired NaN‐aware splitting
def split_on_nan_pair(data1, data2, max_consecutive_nans=4):
    """
    Split paired arrays into segments, honouring NaNs in either input.
    """
    data1 = np.asarray(data1)
    data2 = np.asarray(data2)
    combined_nan = np.isnan(data1) | np.isnan(data2)

    segments = []
    n = len(data1)
    start_idx = 0
    nan_count = 0
    for i in range(n):
        nan_count = nan_count + 1 if combined_nan[i] else 0
        if nan_count >= max_consecutive_nans:
            end_idx = i - max_consecutive_nans
            if end_idx > start_idx:
                segm1 = data1[start_idx:end_idx]
                segm2 = data2[start_idx:end_idx]
                if len(segm1) > 1:
                    segments.append((segm1, segm2))
            start_idx = i
            nan_count = 0
    if start_idx < n:
        segm1 = data1[start_idx:]
        segm2 = data2[start_idx:]
        if len(segm1) > 1:
            segments.append((segm1, segm2))
    return segments

5.2.2 Welch Coherence¶

The welch_coherence_1D function implements the calculation of spectral coherence between two signals using Welch's method, providing a frequency-dependent measure of their linear relationship. Spectral coherence is a critical metric for validating SWOT data, as it quantifies how well the satellite captures the spatial patterns observed in MVP measurements across different scales.

Mathematically, the magnitude-squared coherence between two signals $x(t)$ and $y(t)$ is defined as:

$$C_{xy}(f) = \frac{|P_{xy}(f)|^2}{P_{xx}(f)P_{yy}(f)}$$

Where:

  • $P_{xy}(f)$ is the cross-spectral density between $x$ and $y$
  • $P_{xx}(f)$ and $P_{yy}(f)$ are the auto-spectral densities of $x$ and $y$ respectively

The coherence value ranges from 0 to 1, where:

  • $C_{xy}(f) = 1$ indicates perfect linear relationship between the signals at frequency $f$
  • $C_{xy}(f) = 0$ indicates no linear relationship at that frequency

The function utilizes SciPy's signal.coherence implementation, which applies Welch's method for consistent and robust coherence estimation. Key parameters include:

  • nperseg: Sets the length of each segment, balancing frequency resolution against statistical confidence
  • window='hann': Applies a Hann window to reduce spectral leakage
  • noverlap=nperseg//2: Uses 50% overlap between segments to ensure smooth transitions
  • detrend='constant': Removes the mean from each segment to eliminate DC offset effects

The minimum signal length requirement of 8 points ensures statistical reliability of the coherence estimate.

In the context of SWOT validation, coherence analysis provides several key insights:

  1. Scale-dependent agreement: It reveals at which spatial scales satellite and in-situ measurements show strongest agreement
  2. Noise floor identification: Low coherence at high frequencies often indicates the scale at which measurement noise dominates
  3. Satellite performance assessment: Comparing SWOT-MVP coherence with DUACS-MVP coherence quantifies SWOT's improvement over conventional altimetry

This coherence analysis is particularly valuable for understanding the effective resolution of SWOT measurements and identifying scales where additional calibration or improved processing might enhance agreement with in-situ observations.

In [ ]:
### 5.2.2 Welch coherence
def welch_coherence_1D(signal1, signal2, fs=1.0):
    """
    Compute Welch spectral coherence between two signals.
    """
    n = len(signal1)
    if n < 8:
        return None, None
    nperseg = min(int(n // 2), 256)
    freq, coh = signal.coherence(signal1, signal2, fs=fs, window='hann',
                                 nperseg=nperseg, noverlap=nperseg // 2,
                                 detrend='constant')
    return freq, coh

5.2.3 Coherence for Segmented Data¶

The compute_segments_coherence function extends our coherence analysis capability to handle real-world oceanographic data with gaps and discontinuities. It integrates the paired segmentation approach with Welch's coherence estimation to generate coherence estimates for multiple continuous segments where both datasets have valid measurements.

This function follows a structured approach:

  1. Paired segmentation: It uses split_on_nan_pair to identify continuous segments where both signals have sufficient valid data:

    segments = split_on_nan_pair(data1, data2, max_nan)
    
  2. Additional cleaning: For each segment pair, it applies a final filtering step to remove any remaining NaN values:

    mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
    seg1_clean = seg1[mask_valid]
    seg2_clean = seg2[mask_valid]
    
  3. Length validation: It enforces a minimum segment length to ensure statistical reliability:

    if len(seg1_clean) < min_length:
        continue
    
  4. Coherence computation: For each valid segment pair, it calculates the coherence spectrum using Welch's method:

    f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
    
  5. Result collection: Valid frequency-coherence pairs are collected for subsequent ensemble averaging.

This segmented approach to coherence estimation provides several advantages:

  • Robustness: By processing multiple independent segments, we obtain a more statistically reliable coherence estimate
  • Data utilization: It maximizes the use of available data despite gaps and discontinuities
  • Scale resolution: The ensemble of segments captures coherence characteristics across the range of spatial scales represented in our data

The parameter min_length=10 ensures that coherence estimates are based on sufficiently long segments to provide meaningful information about the scale-dependent relationship between in-situ and satellite measurements.

In our validation framework, this function enables us to quantify how well SWOT captures the spatial patterns observed by MVP at different scales, providing critical insight into the satellite's effective resolution and accuracy.

In [ ]:
### 5.2.3 Coherence for segmented data
def compute_segments_coherence(data1, data2, max_nan=4, fs=1.0, min_length=10):
    """
    Compute Welch coherence for each paired segment of (data1, data2).
    """
    segments = split_on_nan_pair(data1, data2, max_nan)
    results = []
    for seg1, seg2 in segments:
        mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
        seg1_clean = seg1[mask_valid]
        seg2_clean = seg2[mask_valid]
        if len(seg1_clean) < min_length:
            continue
        f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
        if f is not None and co is not None:
            results.append((f, co))
    return results

6. Loop Over Profiling Lines¶

6.1 Initialization of Accumulators¶

This section sets up the data structures needed to accumulate spectral results across multiple MVP transects. By analyzing all available transects together, we can generate statistically robust ensemble spectra that capture the general characteristics of the region while minimizing the influence of transect-specific anomalies.

The code initializes two types of containers:

  1. Power Spectral Density (PSD) containers:

    • global_freq_mvp, global_psd_mvp: For in-situ MVP measurements
    • global_freq_swot, global_psd_swot: For SWOT satellite data
    • global_freq_duacs, global_psd_duacs: For DUACS multi-mission altimetry
  2. Spectral Coherence containers:

    • global_freq_coh_mvp_swot, global_coh_mvp_swot: For MVP-SWOT coherence
    • global_freq_coh_mvp_duacs, global_coh_mvp_duacs: For MVP-DUACS coherence
    • global_freq_coh_swot_duacs, global_coh_swot_duacs: For SWOT-DUACS coherence

These global accumulators will store the spectral results from each transect (PL5 through PL24), which will later be combined through ensemble averaging to produce our final spectral estimates.

The approach of processing multiple transects (20 in total) provides several advantages:

  1. Statistical robustness: By combining data from many independent transects, we reduce the influence of random variability and measurement errors.

  2. Spatial coverage: The transects collectively sample a broader area of the Northwestern Mediterranean, capturing a more representative picture of the region's dynamics.

  3. Feature diversity: Different transects may intersect various oceanographic features (eddies, fronts, etc.), providing a comprehensive view of how well SWOT captures different types of structures.

This ensemble approach is particularly important for spectral analysis, where averaging across multiple independent estimates significantly improves the statistical confidence in the resulting spectra and coherence functions.

In [ ]:
## 6.1 Initialisation of accumulators
PLs = np.arange(5, 25)                       # profiling lines 5 – 24

# Power-spectral-density (PSD) containers
global_freq_mvp,   global_psd_mvp   = [], []
global_freq_swot,  global_psd_swot  = [], []
global_freq_duacs, global_psd_duacs = [], []

# Spectral-coherence containers
global_freq_coh_mvp_swot,  global_coh_mvp_swot   = [], []
global_freq_coh_mvp_duacs, global_coh_mvp_duacs  = [], []
global_freq_coh_swot_duacs, global_coh_swot_duacs = [], []

6.2 Iterative Processing¶

This section implements the core processing loop that systematically analyzes each MVP transect (PL5 through PL24), computing spectral characteristics and accumulating results for ensemble analysis.

6.2.1 MVP Data Loading and Dynamic Height Computation¶

The first step in processing each transect is loading the MVP data and computing dynamic height fields. This process follows established oceanographic principles:

  1. Data loading: The code loads two complementary MATLAB files:

    • One containing metadata (particularly timestamps)
    • Another containing the actual measurement data (temperature, salinity, pressure)
  2. Date extraction: The function converts MATLAB dates to Python datetime objects using the previously defined utility function.

  3. Thermodynamic calculations: Following the TEOS-10 standard, the code:

    • Converts Practical Salinity (SP) to Absolute Salinity (SA)
    • Converts in-situ temperature to Conservative Temperature (CT)
  4. Dynamic height calculation: For each valid profile in the transect, the code:

    • Identifies valid measurement points (non-NaN values)
    • Uses the maximum pressure as the reference level (p_ref = p_max)
    • Computes dynamic height using the gsw.geo_strf_dyn_height function
    • Normalizes by gravitational acceleration (g = 9.81 m/s²)
  5. Anomaly calculation: Finally, the code computes centered anomalies by subtracting the mean:

    dhc = dh_array - np.nanmean(dh_array)

This centered dynamic height anomaly (dhc) is particularly important for spectral analysis as it:

  • Removes the constant offset that would otherwise dominate the spectrum
  • Focuses on the spatial variations that reveal oceanographic structures
  • Makes the MVP measurements directly comparable to satellite anomalies

The implementation includes robust error handling to skip transects with invalid dates or where all dynamic height values are NaN, ensuring that only viable data contributes to our spectral analysis.

In [ ]:
## 6.2 Iterative processing
for PL in PLs:
    print(f"\n[SPECTRAL ANALYSIS] – PL{PL} in progress...")
    try:
        # ### 6.2.1 MVP data loading and dynamic-height computation
        mat_file = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_array = mat_file['metadata']['date']
        dates_mvp = read_mat_datetime(date_array)
        if len(dates_mvp) == 0:
            raise ValueError(f"No valid date in PL{PL}")

        mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mat['lon']
        lat_mvp = mat['lat']
        p_mvp   = mat['pres']
        t_mvp   = mat['temp']
        SP_mvp  = mat['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81
        dh_list = []
        for i in range(SA.shape[0]):
            valid_ = ~np.isnan(SA[i])
            if np.sum(valid_) == 0:
                dh_list.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid_])                       # p_ref = p_max
            if np.isnan(p_ref):
                dh_list.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid_], CT[i][valid_],
                                           p_mvp[i][valid_], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh_list.append(dh_i)
        dh_array = np.array(dh_list, dtype=float)
        if np.all(np.isnan(dh_array)):
            raise ValueError(f"PL{PL}: All dh are NaN")

        dhc = dh_array - np.nanmean(dh_array)                         # centred anomaly

6.2.2 MVP Transect Dataframe and Distance Along Track¶

After computing dynamic height, the code organizes the MVP data into a structured format and calculates the cumulative distance along the transect. This spatial referencing is essential for spectral analysis, as it establishes the physical distance over which variations occur.

The process involves two main steps:

  1. Data organization: The code creates a pandas DataFrame containing:

    • Raw dynamic height (dh)
    • Centered dynamic height anomalies (dhc)
    • Geographic coordinates (longitude and latitude)

    This structured format facilitates subsequent processing and analysis while ensuring that the different variables remain properly aligned.

  2. Cumulative distance calculation: For each point along the transect, the code calculates the geodesic distance from the starting point:

    dist_along[i] = dist_along[i-1] + geopy.distance.distance(
        (df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
        (df_mvp_raw['lat'].values[i],   df_mvp_raw['lon'].values[i])
    ).km
    

    This calculation uses geopy's geodesic distance function, which accounts for Earth's curvature to provide accurate distances even for longer transects.

The resulting distance metric serves as the independent variable for our spectral analysis, representing the spatial dimension along which we'll analyze variations in dynamic height. This is particularly important because:

  • It establishes a consistent spatial reference system for the transect
  • It enables the interpretation of spectral results in terms of physical wavelengths
  • It facilitates the comparison of spectral characteristics across different measurement platforms

By converting from geographic coordinates to distance along track, we transform the data into a form suitable for one-dimensional spectral analysis, enabling the identification of characteristic spatial scales in the oceanographic features observed along the transect.

In [ ]:
### 6.2.2 MVP transect dataframe and distance along track
        n_profiles = SA.shape[0]
        if lon_mvp.shape[1] < n_profiles:
            raise ValueError(f"PL{PL}: lon_mvp dimension mismatch")
        df_mvp_raw = pd.DataFrame({
            'dh':  dh_array,
            'dhc': dhc,
            'lon': lon_mvp[0, :n_profiles],
            'lat': lat_mvp[0, :n_profiles]
        })
        dist_along = np.zeros(n_profiles)
        for i in range(1, n_profiles):
            dist_along[i] = dist_along[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
                (df_mvp_raw['lat'].values[i],   df_mvp_raw['lon'].values[i])
            ).km
        df_mvp_raw['dist'] = dist_along

6.2.3 Regular 1 km Re-sampling (No Smoothing)¶

This step transforms the irregularly spaced MVP measurements into a regular grid with uniform 1 km spacing. Unlike in previous analyses, no smoothing is applied here, preserving the high-frequency content essential for spectral analysis.

The resampling process involves:

  1. Regular distance grid creation: The code creates a uniform distance grid from 0 to the maximum transect length in 1 km increments:

    dist_reg = np.arange(0, dist_max + step_km, step_km)
    
  2. Linear interpolation: All variables (longitude, latitude, dynamic height) are interpolated onto this regular grid using simple linear interpolation:

    lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
    
  3. Dataframe creation: The interpolated values are organized into a new DataFrame with regular spacing.

This regular resampling is crucial for spectral analysis for several reasons:

  • Uniform sampling: Fourier-based spectral methods assume uniform sampling intervals
  • Spectral integrity: The 1 km spacing preserves variations down to 2 km wavelengths (Nyquist limit)
  • Direct interpretation: With uniform spacing, frequency directly translates to spatial wavelength

Importantly, this step applies only interpolation without smoothing. This contrasts with our previous validation analyses where smoothing helped reduce noise. For spectral analysis, we deliberately preserve the high-frequency content because:

  1. The spectral analysis itself will reveal the noise characteristics
  2. Smoothing would artificially reduce energy at higher frequencies
  3. We want to compare the true spectral characteristics of each dataset

With this regular sampling, we've prepared the MVP data for spectral analysis while preserving its full frequency content. This will allow us to accurately assess how energy is distributed across different spatial scales in the original measurements.

In [ ]:
### 6.2.3 Regular 1 km re-sampling (no smoothing)
        step_km  = 1
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg   = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon':  lon_reg,
            'lat':  lat_reg,
            'dh':   dh_reg,
            'dhc':  dhc_reg
        })

6.2.4 External ADT Colocation (SWOT & DUACS)¶

This section matches satellite altimetry data from both SWOT and DUACS to the MVP transect points, enabling direct spectral comparison between in-situ and satellite measurements. The colocation process ensures that all three datasets sample the same oceanographic features along the transect.

For each transect, the code:

  1. Extracts the date from MVP measurements to identify matching satellite data
  2. Loads and interpolates SWOT data onto the MVP positions using the load_SWOT function
  3. Similarly processes DUACS data using the load_DUACS function
  4. Centers both satellite datasets by subtracting their respective means

The centering operation is particularly important for spectral analysis as it:

  • Removes constant offsets that don't contribute to the spectral characteristics
  • Focuses on the spatial variations that reveal oceanographic features
  • Makes the different datasets directly comparable despite their different absolute references

The code includes robust error handling with fallback to NaN arrays if satellite data is unavailable for the transect date. This ensures that missing satellite coverage doesn't halt the processing of valid MVP data.

This three-way colocation creates a powerful validation framework where we can directly compare:

  • MVP measurements (high-resolution in-situ reference)
  • SWOT data (new wide-swath altimetry technology)
  • DUACS data (established multi-mission altimetry product)

By analyzing the spectral characteristics of these colocated measurements, we can assess how well each satellite product captures the spatial variations observed in the in-situ data across different scales, with particular focus on SWOT's performance in resolving submesoscale features.

In [ ]:
### 6.2.4 External ADT colocation (SWOT & DUACS)
        dayv_str = dates_mvp[0].strftime('%Y-%m-%d')
        adt_swot = load_SWOT(df_mvp, dayv_str,
                             '/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/', '')
        adt_swot = np.full(len(df_mvp), np.nan) if adt_swot is None else adt_swot - np.nanmean(adt_swot)

        adt_duacs = load_DUACS(df_mvp, dayv_str,
                               '/home/bessiere/Documents/DUACS/', '')
        adt_duacs = np.full(len(df_mvp), np.nan) if adt_duacs is None else adt_duacs - np.nanmean(adt_duacs)

6.2.5 PSD Computation and Storage¶

With all three datasets prepared and aligned, this step computes the power spectral density (PSD) for each valid segment along the transect. This reveals how energy is distributed across different spatial scales in the MVP, SWOT, and DUACS measurements.

For each dataset, the code:

  1. Calls compute_segments_psd to identify continuous segments and calculate their PSDs
  2. Uses consistent parameters across all datasets:

    • max_nan=4: Tolerates up to 4 consecutive missing values
    • fs=1.0: Sets sampling frequency to 1 sample per km
    • min_length=MIN_SEGMENT_LENGTH: Ensures segments have at least 10 valid points
  3. Stores the resulting frequency-PSD pairs in the global accumulators initialized earlier

This segmented approach to spectral estimation is particularly valuable for oceanographic data because:

  • It maximizes data utilization by processing each valid segment independently
  • It handles discontinuities that naturally occur in real-world measurements
  • It generates multiple independent spectral estimates that can later be ensemble-averaged

The frequency range of our analysis is determined by the sampling parameters:

  • Nyquist frequency: 0.5 cycles/km (corresponding to 2 km wavelength)
  • Lowest resolved frequency: approximately 1/(segment length) cycles/km

By accumulating spectral estimates from all transects, we build a comprehensive picture of how energy is distributed across spatial scales in each measurement system. This will reveal important characteristics such as:

  • The effective resolution of each measurement system
  • The noise floor at high frequencies
  • The spectral slope indicating how energy cascades across scales

These spectral characteristics are fundamental to understanding the oceanographic processes captured by each system and evaluating SWOT's ability to resolve fine-scale features.

In [ ]:
### 6.2.5 PSD computation and storage
        segments_psd_mvp = compute_segments_psd(df_mvp['dhc'].values, max_nan=4,
                                                fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_mvp:
            global_freq_mvp.append(f)
            global_psd_mvp.append(p)

        segments_psd_swot = compute_segments_psd(adt_swot, max_nan=4,
                                                 fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_swot:
            global_freq_swot.append(f)
            global_psd_swot.append(p)

        segments_psd_duacs = compute_segments_psd(adt_duacs, max_nan=4,
                                                  fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_duacs:
            global_freq_duacs.append(f)
            global_psd_duacs.append(p)

6.2.6 Coherence Computation and Storage¶

This final processing step computes the spectral coherence between each pair of datasets, providing a frequency-dependent measure of their linear relationship. Coherence analysis is particularly valuable as it reveals at which spatial scales the different measurement systems capture the same oceanographic signals.

The code computes three distinct coherence relationships:

  1. MVP-SWOT coherence: Evaluates how well SWOT captures the features observed by in-situ measurements
  2. MVP-DUACS coherence: Provides a reference for how well conventional altimetry performs
  3. SWOT-DUACS coherence: Reveals similarities and differences between the two satellite products

For each pair, the code:

  1. Uses compute_segments_coherence to identify valid segment pairs and calculate their coherence spectra
  2. Maintains consistent parameters across all comparisons:

    • max_nan=4: Tolerates up to 4 consecutive missing values
    • fs=1.0: Sets sampling frequency to 1 sample per km
    • min_length=MIN_SEGMENT_LENGTH: Ensures segments have at least 10 valid points
  3. Stores the resulting frequency-coherence pairs in the global accumulators

Coherence values range from 0 to 1, with interpretation:

  • Values near 1 indicate strong linear relationship at that frequency
  • Values near 0 indicate no linear relationship
  • Intermediate values suggest partial correlation or phase differences

This comprehensive coherence analysis framework enables several key evaluation metrics:

  • Scale-dependent validation: At which wavelengths do satellite and in-situ measurements agree?
  • Comparative performance: Does SWOT show improved coherence with MVP compared to DUACS?
  • Effective resolution: At what scale does coherence drop below acceptable levels?

The try-except block ensures that errors in processing individual transects don't halt the entire analysis, allowing the code to continue with the next transect and maximize data utilization.

By accumulating coherence estimates from all available transects, we build a statistically robust picture of the scale-dependent relationships between these different measurement systems.

In [ ]:
### 6.2.6 Coherence computation and storage
        segments_coh_mvp_swot = compute_segments_coherence(df_mvp['dhc'].values,
                                                           adt_swot, max_nan=4,
                                                           fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_mvp_swot:
            global_freq_coh_mvp_swot.append(f)
            global_coh_mvp_swot.append(co)

        segments_coh_mvp_duacs = compute_segments_coherence(df_mvp['dhc'].values,
                                                            adt_duacs, max_nan=4,
                                                            fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_mvp_duacs:
            global_freq_coh_mvp_duacs.append(f)
            global_coh_mvp_duacs.append(co)

        segments_coh_swot_duacs = compute_segments_coherence(adt_swot, adt_duacs,
                                                             max_nan=4, fs=1.0,
                                                             min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_swot_duacs:
            global_freq_coh_swot_duacs.append(f)
            global_coh_swot_duacs.append(co)

    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        continue

7. Ensemble Averaging¶

7.1 Ensemble-Average Helpers¶

This section implements the statistical methods needed to combine spectral estimates from multiple segments and transects into robust ensemble averages. Ensemble averaging is a fundamental technique in spectral analysis that reduces statistical variance while preserving the underlying spectral characteristics.

The code defines two parallel helper functions:

  1. ensemble_average_psd: Combines multiple PSD estimates into a single ensemble-averaged spectrum
  2. ensemble_average_coherence: Similarly combines multiple coherence estimates

Both functions follow the same core methodology:

  1. Establish a common frequency grid: Use the frequency points from the first spectrum as the reference grid
  2. Filter for positive frequencies only: Spectral analysis typically focuses on positive frequencies
  3. Interpolate all spectra onto the common grid: This ensures all estimates can be properly combined
  4. Calculate the nanmean across all estimates: This produces the final ensemble average, ignoring NaN values

The interpolation step is particularly important because:

  • Different segments may have different lengths, resulting in different frequency sampling
  • Ensuring all spectra are evaluated at the same frequency points is necessary for proper averaging
  • Linear interpolation preserves the spectral characteristics while aligning the frequency grids

Mathematically, the ensemble averaging process can be represented as:

$$\bar{P}(f) = \frac{1}{K} \sum_{i=1}^{K} P_i(f)$$

Where:

  • $\bar{P}(f)$ is the ensemble-averaged spectrum
  • $P_i(f)$ is the spectrum from the $i$-th segment
  • $K$ is the total number of segments

This ensemble averaging approach provides several key benefits:

  1. Variance reduction: The variance of the spectral estimate decreases proportionally to the number of segments averaged
  2. Bias preservation: The averaging does not affect the expected value of the spectrum
  3. Robustness: The resulting estimate is less sensitive to peculiarities of individual segments

By applying these ensemble averaging methods to our accumulated spectral and coherence estimates, we can generate statistically robust results that accurately characterize the spectral properties of our oceanographic measurements.

In [ ]:
## 7.1 Ensemble-average helpers
def ensemble_average_psd(freq_list, psd_list):
    """
    Return common-grid frequency and ensemble-mean PSD.
    """
    if len(freq_list) == 0:
        return None, None
    freq_common = freq_list[0][freq_list[0] > 0]
    psd_interp_all = [psd_list[0][freq_list[0] > 0]]
    for freq_, psd_ in zip(freq_list[1:], psd_list[1:]):
        valid = freq_ > 0
        psd_interp = psd_ if np.array_equal(freq_[valid], freq_common) else \
                     np.interp(freq_common, freq_[valid], psd_[valid],
                               left=np.nan, right=np.nan)
        psd_interp_all.append(psd_interp)
    return freq_common, np.nanmean(np.array(psd_interp_all), axis=0)

def ensemble_average_coherence(freq_list, coh_list):
    """
    Return common-grid frequency and ensemble-mean coherence.
    """
    if len(freq_list) == 0:
        return None, None
    freq_common = freq_list[0][freq_list[0] > 0]
    coh_interp_all = [coh_list[0][freq_list[0] > 0]]
    for freq_, coh_ in zip(freq_list[1:], coh_list[1:]):
        valid = freq_ > 0
        coh_interp = coh_ if np.array_equal(freq_[valid], freq_common) else \
                     np.interp(freq_common, freq_[valid], coh_[valid],
                               left=np.nan, right=np.nan)
        coh_interp_all.append(coh_interp)
    return freq_common, np.nanmean(np.array(coh_interp_all), axis=0)

7.2 Ensemble PSD and Coherence Computation¶

After processing all transects, this section applies the ensemble averaging methods to our accumulated spectral estimates, generating the final power spectral density (PSD) and coherence spectra that will form the basis of our analysis.

The ensemble averaging is applied to six distinct spectral quantities:

  1. Three PSD estimates:

    • MVP dynamic height spectrum (in-situ reference)
    • SWOT ADT spectrum (new wide-swath altimetry)
    • DUACS ADT spectrum (conventional multi-mission altimetry)
  2. Three coherence relationships:

    • MVP-SWOT coherence (primary validation metric)
    • MVP-DUACS coherence (reference for conventional altimetry)
    • SWOT-DUACS coherence (satellite intercomparison)

Each ensemble average combines results from multiple segments across all available transects, providing statistically robust spectral estimates that represent the general characteristics of the Northwestern Mediterranean region.

The final conversion from m² to cm² (multiplying by 10⁴) is an important scaling step that:

  • Expresses sea surface height variations in more intuitive units (centimeters)
  • Aligns with common conventions in oceanographic literature
  • Makes the PSD values more readable when plotted (avoiding very small numbers)

This unit conversion doesn't affect the spectral shape or coherence values, only the absolute magnitude of the PSD.

These ensemble-averaged spectra enable several key analyses:

  1. Comparison of energy distribution across scales between different measurement systems
  2. Identification of spectral slopes indicating energy cascade characteristics
  3. Assessment of scale-dependent coherence between in-situ and satellite measurements
  4. Evaluation of SWOT's performance relative to conventional altimetry

The results will provide comprehensive insight into SWOT's ability to resolve oceanographic features across different spatial scales, particularly in the submesoscale range that has traditionally been challenging for satellite altimetry.

In [ ]:
## 7.2 Ensemble PSD and coherence computation
freq_mvp_ens,   psd_mvp_ens   = ensemble_average_psd(global_freq_mvp,   global_psd_mvp)
freq_swot_ens,  psd_swot_ens  = ensemble_average_psd(global_freq_swot,  global_psd_swot)
freq_duacs_ens, psd_duacs_ens = ensemble_average_psd(global_freq_duacs, global_psd_duacs)

freq_coh_ens_mvp_swot,  coh_ens_mvp_swot  = ensemble_average_coherence(global_freq_coh_mvp_swot,  global_coh_mvp_swot)
freq_coh_ens_mvp_duacs, coh_ens_mvp_duacs = ensemble_average_coherence(global_freq_coh_mvp_duacs, global_coh_mvp_duacs)
freq_coh_ens_swot_duacs, coh_ens_swot_duacs = ensemble_average_coherence(global_freq_coh_swot_duacs, global_coh_swot_duacs)

# Convert m² (cyc km)⁻¹ → cm² (cyc km)⁻¹
if psd_mvp_ens  is not None: psd_mvp_ens  *= 1.0e4
if psd_swot_ens is not None: psd_swot_ens *= 1.0e4
if psd_duacs_ens is not None: psd_duacs_ens *= 1.0e4

8. Visualization¶

8.1 Complete Figure – Ensemble PSD and Coherence¶

8.1.1 Figure and Axes¶

The visualization section begins with the creation of a comprehensive figure that will display the power spectral density (PSD) and coherence results. This figure is designed to provide a complete picture of the spectral characteristics of our three measurement systems across different spatial scales.

The code initializes a figure with dimensions 8×6 inches, providing sufficient space to clearly display the complex spectral information while maintaining readability. A single subplot is created to contain all spectral curves, coherence relationships, and background elements.

This unified presentation approach offers several advantages:

  1. It allows direct visual comparison between different spectral estimates
  2. It enables immediate assessment of scale-dependent relationships
  3. It provides a comprehensive view that facilitates interpretation of the results

The figure will incorporate multiple elements:

  • PSD curves for MVP, SWOT, and DUACS
  • Coherence curves for the three pairwise relationships
  • Background shading to indicate different scale regimes
  • Reference slope lines for theoretical spectral behavior
  • Dual axes for both frequency and wavelength representations

This integrated visualization approach is particularly valuable for spectral analysis, as it allows us to simultaneously assess energy distribution, scaling behavior, and inter-dataset relationships across the full range of spatial scales represented in our data.

In [ ]:
### 8.1.1 Figure and Axes
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)

8.1.2 Spatial-Scale Background¶

This visualization element establishes a clear reference framework for interpreting the spectral results by dividing the frequency domain into three distinct oceanographic scale regimes. These colored background panels serve as both a visual guide and a conceptual framework, helping to contextualize the spectral characteristics within established oceanographic paradigms.

The three scale regimes are:

  1. Large scale (>100 km) - Yellow background:

    • Frequency range: < 0.01 cycles/km
    • Corresponds to basin-scale circulation features and large current systems
    • Typically well-captured by conventional satellite altimetry
    • Dominated by geostrophic dynamics with slow evolution
  2. Mesoscale (10-100 km) - Green background:

    • Frequency range: 0.01-0.1 cycles/km
    • Encompasses eddies, meanders, and frontal systems
    • Contains the majority of ocean kinetic energy
    • Primary target for SWOT's improved resolution compared to conventional altimetry
  3. Small scale (<10 km) - Red background:

    • Frequency range: >0.1 cycles/km
    • Includes submesoscale eddies, filaments, and internal waves
    • Region where ageostrophic dynamics become increasingly important
    • Tests the limits of SWOT's resolving capability

These scale definitions align with established oceanographic conventions and focus attention on the critical mesoscale-to-submesoscale transition region (around 10-30 km) where SWOT is expected to show significant improvements over conventional altimetry.

By using subtle background shading (alpha=0.1) with low z-order (zorder=0), the code ensures these reference panels don't interfere with the spectral curves while providing important context for interpreting the results. This visual framework will help readers immediately recognize which features fall within each dynamical regime and facilitate discussion of scale-dependent satellite performance.

In [ ]:
### 8.1.2 Spatial‐scale Background
ax.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
ax.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
ax.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')

8.1.3 Helper for PSD Lines¶

This section defines a utility function for plotting power spectral density curves and establishes a consistent color scheme for the different datasets. These elements ensure visual clarity and consistency throughout the visualization.

The plot_psd_line function serves several important purposes:

  1. Data validation: It filters out invalid data points (NaNs and non-positive values) that could cause errors in logarithmic plotting
  2. Error handling: It returns None when there are insufficient valid points for meaningful visualization
  3. Consistent formatting: It applies the same logarithmic scaling and styling to all spectral curves
  4. Return value: It provides the line object for later reference in the legend

The color selection uses a scientific approach based on the viridis colormap, which is designed to:

  • Be perceptually uniform across its range
  • Remain distinguishable for people with color vision deficiencies
  • Maintain clarity when printed in grayscale

By sampling three points from this colormap (at positions 0.3, 0.5, and 0.7), the code creates a visually harmonious set of colors that clearly differentiate the three datasets while maintaining a cohesive aesthetic.

The resulting color assignments are:

  • color_mvp: For in-situ MVP measurements (perceptually the "first" color)
  • color_swot: For SWOT satellite data (perceptually the "middle" color)
  • color_duacs: For DUACS multi-mission altimetry (perceptually the "last" color)

This systematic approach to color selection enhances the readability of the complex multi-curve visualization while ensuring aesthetic quality, accessibility, and scientific rigor.

In [ ]:
### 8.1.3 Helper for PSD Lines
def plot_psd_line(freq, psd, color, label):
    if freq is not None and psd is not None:
        valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
        if np.sum(valid) < 2:
            return None
        line, = ax.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
        return line
    return None

colors = plt.cm.viridis(np.linspace(0.3, 0.7, 3))  # colour map sampling
color_mvp, color_swot, color_duacs = colors[0], colors[1], colors[2]

8.1.4 PSD Curves¶

This section plots the ensemble-averaged power spectral density curves for all three measurement systems. These curves represent the core results of our spectral analysis, revealing how energy is distributed across different spatial scales in each dataset.

The code uses the previously defined plot_psd_line helper function to plot three distinct spectral curves:

  1. MVP PSD (in the first viridis color):

    • Represents the spectral characteristics of in-situ dynamic height measurements
    • Serves as the ground-truth reference for evaluating satellite performance
    • Typically shows higher energy at small scales compared to satellite measurements
  2. SWOT PSD (in the middle viridis color):

    • Shows the spectral characteristics of SWOT's wide-swath altimetry
    • The primary focus of our validation analysis
    • Expected to capture more small-scale energy than conventional altimetry
  3. DUACS PSD (in the third viridis color):

    • Represents conventional multi-mission altimetry
    • Provides a baseline for comparing SWOT's performance
    • Typically shows stronger filtering at higher frequencies

These PSD curves are plotted on logarithmic axes (using loglog), which is the standard approach for spectral analysis because:

  • It allows visualization across multiple orders of magnitude
  • It makes power-law relationships appear as straight lines
  • It emphasizes relative scaling behavior rather than absolute differences

By examining these curves, we can assess several key aspects:

  • The relative energy levels at different scales
  • The spectral slopes indicating energy cascade characteristics
  • The frequency at which satellite measurements begin to diverge from in-situ observations
  • The effective resolution limit where noise begins to dominate the signal

These insights are crucial for understanding SWOT's capabilities and limitations in capturing oceanographic features across different spatial scales.

In [ ]:
### 8.1.4 PSD Curves
line_mvp   = plot_psd_line(freq_mvp_ens, psd_mvp_ens, color_mvp,   'MVP PSD')
line_swot  = plot_psd_line(freq_swot_ens, psd_swot_ens, color_swot,  'SWOT PSD')
line_duacs = plot_psd_line(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')

8.1.5 Log–Log Regression Lines¶

This section enhances the visualization by adding best-fit power-law trend lines to each PSD curve. These regression lines quantify the spectral slopes, which are key indicators of the underlying physical processes and energy cascade characteristics in the ocean.

For each dataset (MVP, SWOT, and DUACS), the code:

  1. Computes the best-fit power-law using the loglog_regression function
  2. Plots the resulting regression line using the same color as the original PSD but with a dashed style
  3. Includes the numerical slope value in the label for quantitative reference

In oceanographic spectral analysis, these spectral slopes have specific physical interpretations:

  • Slope ≈ -5/3 (-1.67): Indicates three-dimensional isotropic turbulence (Kolmogorov spectrum)
  • Slope ≈ -2: Associated with surface quasi-geostrophic (SQG) turbulence
  • Slope ≈ -3: Characteristic of quasi-geostrophic (QG) turbulence in the forward energy cascade
  • Steeper slopes: May indicate strong filtering or noise suppression

By comparing the fitted slopes across the three datasets, we can assess:

  1. Physical consistency: Whether satellite measurements capture the same energy scaling as in-situ observations
  2. Processing effects: How satellite data processing affects spectral characteristics
  3. Scale dependence: Whether the slopes vary across different frequency ranges

The code handles potential errors by checking for valid data and only plotting regression lines when meaningful fits can be obtained. The resulting handles are collected in handles_slopes for later inclusion in the legend.

These regression lines provide a quantitative foundation for interpreting the spectral results, moving beyond visual comparison to precise characterization of the energy distribution across scales in each measurement system.

In [ ]:
### 8.1.5 Log–log Regression Lines
handles_slopes = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
    m_mvp, c_mvp = loglog_regression(freq_mvp_ens, psd_mvp_ens)
    if not np.isnan(m_mvp):
        reg_line_mvp, = ax.loglog(freq_mvp_ens, 10**c_mvp * freq_mvp_ens**m_mvp, 
                                  color=color_mvp, linestyle='--',
                                  label=f"MVP slope = {m_mvp:.2f}")
        handles_slopes.append(reg_line_mvp)

if freq_swot_ens is not None and psd_swot_ens is not None:
    m_swot, c_swot = loglog_regression(freq_swot_ens, psd_swot_ens)
    if not np.isnan(m_swot):
        reg_line_swot, = ax.loglog(freq_swot_ens, 10**c_swot * freq_swot_ens**m_swot, 
                                   color=color_swot, linestyle='--',
                                   label=f"SWOT slope = {m_swot:.2f}")
        handles_slopes.append(reg_line_swot)

if freq_duacs_ens is not None and psd_duacs_ens is not None:
    m_duacs, c_duacs = loglog_regression(freq_duacs_ens, psd_duacs_ens)
    if not np.isnan(m_duacs):
        reg_line_duacs, = ax.loglog(freq_duacs_ens, 10**c_duacs * freq_duacs_ens**m_duacs, 
                                    color=color_duacs, linestyle='--',
                                    label=f"DUACS slope = {m_duacs:.2f}")
        handles_slopes.append(reg_line_duacs)

8.1.6 Reference-Slope Lines¶

This section adds theoretical reference slopes to the plot, providing important context for interpreting the observed spectral characteristics in relation to established oceanographic turbulence theories.

The code adds two reference lines with specific slopes:

  1. Slope -3 (black dotted line):

    • Represents the theoretical spectral slope for quasi-geostrophic (QG) turbulence
    • Expected in the forward energy cascade of balanced geostrophic flows
    • Classical model for mesoscale ocean dynamics (50-200 km)
  2. Slope -2 (gray dotted line):

    • Represents the theoretical slope for surface quasi-geostrophic (SQG) dynamics
    • Relevant for submesoscale processes with active surface buoyancy
    • Often observed in the transition between mesoscale and submesoscale regimes

These reference lines are carefully positioned to align with the observed data:

  • Their amplitude is scaled to 80% of the first MVP PSD value for visual clarity
  • They span the full frequency range of interest (10^-3 to 1 cycles/km)
  • They use dotted lines to distinguish them from the empirical regression lines

By including these theoretical references, the visualization enables direct comparison between observed spectral slopes and theoretical predictions. This comparison helps assess:

  1. Which dynamical regime dominates at different scales
  2. Whether satellite measurements capture the expected physical scaling
  3. At what scales transitions between different dynamical regimes occur

This approach connects the empirical spectral analysis to fundamental oceanographic theory, providing a deeper physical interpretation of the results. The reference lines are particularly valuable for evaluating whether SWOT captures the theoretically expected energy cascade characteristics across the mesoscale-to-submesoscale transition.

In [ ]:
### 8.1.6 Reference‐slope Lines
f_theory = np.linspace(1e-3, 1, 200)
ref_val = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-3), 
                            color='black', linestyle=':', label="Ref slope -3")
theory_line_m2, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-2), 
                            color='gray', linestyle=':', label="Ref slope -2")
handles_slopes.extend([theory_line_m3, theory_line_m2])

8.1.7 Secondary X-Axis (Wavelength)¶

This section adds a secondary x-axis that displays spatial wavelength instead of frequency, making the plot more intuitive for oceanographers who typically think in terms of feature size rather than spatial frequency.

While the primary x-axis represents frequency in cycles per kilometer, the secondary axis shows the corresponding wavelength in kilometers. These are inversely related:

$$\text{Wavelength} = \frac{1}{\text{Frequency}}$$

The implementation uses matplotlib's secondary_xaxis feature with conversion functions:

  • From frequency to wavelength: lambda f: 1/f
  • From wavelength to frequency: lambda w: 1/w

This dual representation offers several advantages:

  1. Intuitive interpretation: Oceanographers can directly read off the spatial scales of features (e.g., "20 km mesoscale eddies" rather than "0.05 cycles/km")

  2. Physical relevance: Wavelength directly corresponds to the physical size of oceanographic structures, making it easier to relate spectral features to real-world phenomena

  3. Comparison with theory: Many oceanographic theories and models are expressed in terms of characteristic length scales rather than frequencies

The secondary axis is positioned at the top of the plot, with a clear label "Wavelength (km)" to distinguish it from the primary frequency axis. This arrangement allows readers to seamlessly translate between frequency and wavelength domains while interpreting the spectral results.

With this dual-axis approach, the visualization becomes more accessible to oceanographers from different backgrounds, whether they're more comfortable thinking in terms of frequencies (common in signal processing) or wavelengths (common in physical oceanography).

In [ ]:
### 8.1.7 Secondary X-axis (Wavelength)
secax = ax.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax.set_xlabel('Wavelength (km)')

8.1.8 Axes Formatting¶

This section finalizes the core axes formatting, ensuring the figure is scientifically accurate, visually clear, and properly labeled. These formatting choices are crucial for effective communication of the spectral analysis results.

The code applies several important formatting elements:

  1. Axis labels:

    • X-axis: "Frequency (cyc/km)" - Indicates that the horizontal axis represents spatial frequency in cycles per kilometer
    • Y-axis: "PSD [cm²/cyc/km]" - Specifies the power spectral density units, with height variance expressed in cm²
  2. Title: "Ensemble PSD + Coherence (MVP, SWOT, DUACS)" - Clearly identifies the content and data sources in the figure

  3. Grid: A subtle grid is added with which='both' to show grid lines at both major and minor tick marks, using dashed lines with 40% opacity for reference without visual distraction

  4. Logarithmic scales: Both axes use logarithmic scaling (set_xscale('log') and set_yscale('log')), which is standard for spectral analysis as it:

    • Allows visualization across multiple orders of magnitude
    • Makes power-law relationships appear as straight lines
    • Emphasizes relative ratios rather than absolute differences
  5. X-axis limits: The frequency range is set to 10⁻³ to 1 cycles/km, corresponding to wavelengths from 1 km to 1000 km. This range encompasses:

    • The full mesoscale range (10-100 km)
    • The larger submesoscale features (1-10 km)
    • Some of the large-scale ocean features (>100 km)

These formatting choices ensure that the spectral results are presented in a scientifically rigorous manner while maintaining visual clarity. The consistent use of units and clear labeling helps readers interpret the complex information contained in the spectral curves and coherence relationships.

In [ ]:
### 8.1.8 Axes Formatting
ax.set_xlabel("Frequency (cyc/km)")
ax.set_ylabel("PSD [cm²/cyc/km]")
ax.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS)")
ax.grid(True, which='both', linestyle='--', alpha=0.4)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlim(1e-3, 1)

8.1.9 Secondary Y-Axis for Coherence¶

This section adds a secondary y-axis to display the spectral coherence relationships alongside the power spectra. This dual-axis approach allows direct comparison of both the energy distribution and the scale-dependent linear relationships between different measurement systems.

The secondary y-axis is created using ax.twinx(), which shares the same x-axis but has an independent y-scale. It is configured to display coherence values that range from 0 to 1:

  • The axis limits are set to 0-2, with actual coherence values contained in the 0-1 range
  • Custom tick marks are added at 0, 0.5, and 1 (with empty labels for the higher values)
  • The axis is clearly labeled as "Coherence (0-1)"

Three coherence relationships are plotted, each with a distinct dark color:

  1. MVP-SWOT coherence (dark blue):

    • Quantifies agreement between in-situ measurements and SWOT
    • Primary metric for SWOT validation
  2. MVP-DUACS coherence (dark green):

    • Shows agreement between in-situ measurements and conventional altimetry
    • Serves as baseline for comparing SWOT's performance
  3. SWOT-DUACS coherence (dark red):

    • Reveals similarities and differences between the two satellite products
    • Helps identify satellite-specific features versus common satellite characteristics

For each relationship, the code:

  • Filters for valid data points (non-NaN, positive frequencies)
  • Uses semilogx plotting (logarithmic x-axis, linear y-axis) appropriate for coherence
  • Assigns high z-order (3) to ensure coherence curves remain visible
  • Captures the line handle for inclusion in the legend

This integrated visualization approach allows several key analyses:

  • Identifying frequencies where satellite measurements show strong or weak coherence with in-situ data
  • Comparing SWOT's coherence with MVP against the DUACS baseline
  • Correlating drops in coherence with features in the power spectra

The coherence curves provide crucial context for interpreting the spectral results, revealing not just how energy is distributed across scales in each dataset, but how consistently the different measurement systems capture the same oceanographic signals.

In [ ]:
### 8.1.9 Secondary Y-axis for Coherence
ax2 = ax.twinx()
ax2.set_ylim(0, 2)
ax2.set_yticks([0, 0.5, 1, 1.5, 2])
ax2.set_yticklabels(['0', '0.5', '1', '', ''])
ax2.set_ylabel('Coherence (0-1)')

coherence_colors = ['darkblue', 'darkgreen', 'darkred']
handles_coherence = []

# MVP–SWOT coherence
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
    valid_coh = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_mvp_swot, = ax2.semilogx(freq_coh_ens_mvp_swot[valid_coh],
                                               coh_ens_mvp_swot[valid_coh],
                                               color=coherence_colors[0],
                                               label='Coherence MVP-SWOT',
                                               zorder=3)
        handles_coherence.append(line_coherence_mvp_swot)

# MVP–DUACS coherence
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
    valid_coh = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_mvp_duacs, = ax2.semilogx(freq_coh_ens_mvp_duacs[valid_coh],
                                                coh_ens_mvp_duacs[valid_coh],
                                                color=coherence_colors[1],
                                                label='Coherence MVP-DUACS',
                                                zorder=3)
        handles_coherence.append(line_coherence_mvp_duacs)

# SWOT–DUACS coherence
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
    valid_coh = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_swot_duacs, = ax2.semilogx(freq_coh_ens_swot_duacs[valid_coh],
                                                 coh_ens_swot_duacs[valid_coh],
                                                 color=coherence_colors[2],
                                                 label='Coherence SWOT-DUACS',
                                                 zorder=3)
        handles_coherence.append(line_coherence_swot_duacs)

8.1.10 Legend (All Elements)¶

This final section creates a comprehensive legend that unifies all elements of the visualization, ensuring clear identification of each component while maintaining an organized and readable presentation.

The legend construction follows a systematic approach:

  1. Element collection: The code gathers handles from all visual elements:

    • Background shading patches identifying scale regimes
    • PSD curves for MVP, SWOT, and DUACS
    • Regression lines showing spectral slopes
    • Coherence curves for different dataset pairs
  2. Legend positioning:

    • Placed at the bottom of the figure (loc='lower center')
    • Centered horizontally (bbox_to_anchor=(0.5, -0.25))
    • Arranged in three columns (ncol=3) for compact presentation
    • Separated from the plot area to avoid overlap (bottom=0.3)
  3. Layout adjustment:

    • tight_layout() optimizes the use of figure space
    • subplots_adjust(bottom=0.3) ensures room for the legend below the plot

This unified legend approach provides several benefits:

  1. Comprehensive reference: All visual elements are identified in one place
  2. Scale efficiency: The multi-column format minimizes vertical space requirements
  3. Visual consistency: Elements are grouped logically for easy reference

The resulting figure presents a complete and integrated view of the spectral characteristics of MVP, SWOT, and DUACS measurements, with clear identification of all components. This visualization enables comprehensive analysis of how well SWOT captures the spectral features observed in in-situ measurements across different spatial scales, with particular focus on the mesoscale-to-submesoscale transition region that is critical for understanding ocean dynamics.

Through this integrated presentation of power spectra, spectral slopes, and coherence relationships, the figure provides a definitive assessment of SWOT's spectral performance relative to both in-situ measurements and conventional altimetry.

In [ ]:
### 8.1.10 Legend (all elements)
handles_curves = [h for h in [line_mvp, line_swot, line_duacs] if h is not None]
patches_fondo = [obj for obj in ax.patches]   # shading patches
handles_all = patches_fondo + handles_curves + handles_slopes + handles_coherence
labels_all = [obj.get_label() for obj in handles_all]

fig.legend(handles_all, labels_all, loc='lower center',
           bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)

plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()

8.2 Zoomed Figure – 0.01 ≤ f ≤ 1 cyc km⁻¹¶

8.2.1 Figure and Axes¶

This section begins the creation of a second figure that provides a zoomed view of the spectral results, focusing specifically on the higher-frequency range (0.01 to 1 cycles/km) that encompasses the mesoscale and submesoscale regimes. This zoomed perspective allows for more detailed examination of the spectral features in the range where SWOT is expected to show improvements over conventional altimetry.

The code initializes a new figure with dimensions matching the previous comprehensive plot (8×6 inches), ensuring visual consistency between the two perspectives. A single subplot is created to contain the zoomed spectral information.

This focused visualization approach offers several important advantages:

  1. Enhanced detail: By zooming in on the frequency range of greatest interest (corresponding to wavelengths from 1 to 100 km), the figure reveals subtle spectral features that might be difficult to discern in the full-range plot.

  2. Clearer comparison: The zoomed view facilitates more precise comparison of spectral characteristics in the critical mesoscale-to-submesoscale transition region.

  3. Better resolution assessment: It provides a clearer picture of where satellite measurements begin to diverge from in-situ observations, helping to quantify the effective resolution of SWOT.

This complementary zoomed view follows the same overall structure as the comprehensive figure, with spectral curves, regression lines, coherence relationships, and reference elements, but with the x-axis limits adjusted to highlight the frequency range of greatest oceanographic interest.

Together with the full-range figure, this zoomed visualization will provide a complete assessment of SWOT's spectral performance across the full range of relevant spatial scales.

In [ ]:
### 8.2.1 Figure and Axes
fig_zoom = plt.figure(figsize=(8,6))
axz = fig_zoom.add_subplot(111)

8.2.2 Spatial-Scale Background (Zoom)¶

This section recreates the spatial scale reference panels in the zoomed figure, maintaining the same color scheme and scale definitions as in the comprehensive plot. Although the figure will focus on higher frequencies, including all three scale regimes provides continuity with the previous figure and ensures clear understanding of the scale context.

The three scale regimes are again delineated:

  1. Large scale (>100 km) - Yellow background:

    • Frequency range: < 0.01 cycles/km
    • Only the edge of this region will be visible in the zoomed view
  2. Mesoscale (10-100 km) - Green background:

    • Frequency range: 0.01-0.1 cycles/km
    • This regime will be fully visible and form a central focus of the zoomed view
  3. Small scale (<10 km) - Red background:

    • Frequency range: >0.1 cycles/km
    • This regime will be prominently displayed, showing the submesoscale features

Despite the x-axis limits being set to focus on frequencies from 0.01 to 1 cycles/km, including the complete set of background panels maintains visual consistency with the full-range figure and provides important context for interpreting the zoomed spectral features.

The consistent use of colors and scale definitions across both figures helps readers mentally connect the zoomed view to the broader spectral context, facilitating unified interpretation of the results. This approach is particularly valuable for highlighting SWOT's performance in the mesoscale-to-submesoscale transition region while maintaining awareness of how these scales relate to the broader spectrum of oceanographic features.

In [ ]:
### 8.2.2 Spatial‐scale Background (zoom)
axz.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
axz.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
axz.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')

8.2.3 Helper for PSD Lines (Zoom)¶

This section defines a helper function specifically for the zoomed plot and uses it to add the power spectral density curves for all three measurement systems. The implementation follows the same approach as in the comprehensive figure, ensuring visual consistency while focusing on the higher-frequency range.

The plot_psd_line_zoom function mirrors the functionality of the previous helper:

  • It filters out invalid data points (NaNs and non-positive values)
  • It ensures sufficient valid points exist for meaningful plotting
  • It applies consistent logarithmic scaling and line styling

The same color scheme is maintained for the three datasets:

  • MVP PSD: First viridis color
  • SWOT PSD: Middle viridis color
  • DUACS PSD: Third viridis color

This consistency in visualization approach between the two figures helps readers make direct connections between the full-range and zoomed perspectives. The zoomed view will highlight several important features that may be harder to discern in the full-range plot:

  1. Spectral divergence points: The frequencies at which satellite measurements begin to deviate from in-situ observations become more apparent

  2. Relative performance: Differences between SWOT and DUACS in capturing higher-frequency features are emphasized

  3. Noise characteristics: The high-frequency noise floor in each dataset becomes more visible

  4. Resolution limits: The effective resolution of each measurement system can be more precisely identified

By maintaining the same visual approach while adjusting the focus to higher frequencies, this zoomed visualization provides enhanced detail in the critical mesoscale-to-submesoscale transition region without sacrificing consistency with the comprehensive spectral overview.

In [ ]:
### 8.2.3 Helper for PSD Lines (zoom)
def plot_psd_line_zoom(freq, psd, color, label):
    if freq is not None and psd is not None:
        valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
        if np.sum(valid) < 2:
            return None
        line, = axz.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
        return line
    return None

line_mvp_z   = plot_psd_line_zoom(freq_mvp_ens, psd_mvp_ens, color_mvp,   'MVP PSD')
line_swot_z  = plot_psd_line_zoom(freq_swot_ens, psd_swot_ens, color_swot,  'SWOT PSD')
line_duacs_z = plot_psd_line_zoom(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')

8.2.4 Regression Lines (Zoom)¶

This section adds power-law regression lines to the zoomed spectral plot, using the same approach as in the comprehensive figure. These regression lines quantify the spectral slopes, which are key indicators of the underlying physical processes and energy cascade characteristics.

For each dataset (MVP, SWOT, and DUACS), the code:

  1. Computes the best-fit power-law using the loglog_regression function
  2. Plots the resulting regression line with dataset-matching color and dashed style
  3. Includes the numerical slope value in the label

Although the same slope values are used as in the full-range plot (since they're calculated over the entire frequency range), their visualization in the zoomed context offers enhanced insight:

  1. Fit quality assessment: The zoomed view makes it easier to assess how well the power-law model fits the actual data in the mesoscale and submesoscale ranges

  2. Slope differences: Variations in slope between different measurement systems become more apparent, highlighting potential systematic differences in how they capture energy across scales

  3. Deviation identification: Points where the actual spectra deviate from the power-law behavior can be more clearly identified, potentially revealing scale-dependent physical processes or measurement artifacts

The regression lines are particularly valuable in the zoomed view because they help quantify the spectral behavior precisely in the frequency range where SWOT is expected to show improvements over conventional altimetry. By directly comparing the slopes of MVP, SWOT, and DUACS spectra in this critical range, we can assess whether SWOT more accurately captures the true energy scaling observed in the in-situ measurements.

These quantitative slope comparisons provide a rigorous metric for evaluating SWOT's performance in resolving the energy cascade characteristics in the mesoscale-to-submesoscale transition region.

In [ ]:
### 8.2.4 Regression Lines (zoom)
handles_slopes_z = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
    m_mvp_z, c_mvp_z = loglog_regression(freq_mvp_ens, psd_mvp_ens)
    if not np.isnan(m_mvp_z):
        reg_line_mvp_z, = axz.loglog(freq_mvp_ens, 10**c_mvp_z * freq_mvp_ens**m_mvp_z, 
                                     color=color_mvp, linestyle='--',
                                     label=f"MVP slope = {m_mvp_z:.2f}")
        handles_slopes_z.append(reg_line_mvp_z)

if freq_swot_ens is not None and psd_swot_ens is not None:
    m_swot_z, c_swot_z = loglog_regression(freq_swot_ens, psd_swot_ens)
    if not np.isnan(m_swot_z):
        reg_line_swot_z, = axz.loglog(freq_swot_ens, 10**c_swot_z * freq_swot_ens**m_swot_z, 
                                      color=color_swot, linestyle='--',
                                      label=f"SWOT slope = {m_swot_z:.2f}")
        handles_slopes_z.append(reg_line_swot_z)

if freq_duacs_ens is not None and psd_duacs_ens is not None:
    m_duacs_z, c_duacs_z = loglog_regression(freq_duacs_ens, psd_duacs_ens)
    if not np.isnan(m_duacs_z):
        reg_line_duacs_z, = axz.loglog(freq_duacs_ens, 10**c_duacs_z * freq_duacs_ens**m_duacs_z, 
                                       color=color_duacs, linestyle='--',
                                       label=f"DUACS slope = {m_duacs_z:.2f}")
        handles_slopes_z.append(reg_line_duacs_z)

8.2.5 Reference-Slope Lines (Zoom)¶

This section adds the theoretical reference slope lines to the zoomed plot, maintaining consistency with the comprehensive figure while providing important context for interpreting the spectral characteristics in relation to established oceanographic turbulence theories.

The code adds the same two reference lines:

  1. Slope -3 (black dotted line):

    • Represents the theoretical spectral slope for quasi-geostrophic (QG) turbulence
    • Expected in the forward energy cascade of balanced geostrophic flows
  2. Slope -2 (gray dotted line):

    • Represents the theoretical slope for surface quasi-geostrophic (SQG) dynamics
    • Often observed in the transition between mesoscale and submesoscale regimes

These reference lines have particular significance in the zoomed view because:

  1. Theoretical transitions: The mesoscale-to-submesoscale transition region highlighted in the zoomed plot is precisely where oceanographic theory suggests a potential shift from -3 (QG) to -2 (SQG) spectral slopes

  2. Resolution assessment: Comparison with these reference lines helps determine whether SWOT can resolve the scales where such theoretical transitions might occur

  3. Filtering effects: Deviations from theoretical slopes at higher frequencies may indicate filtering effects in satellite processing

The reference lines are scaled to the same amplitude as in the comprehensive figure (80% of the first MVP PSD value) and use the same dotted line style for visual consistency. This consistency helps readers connect observations between the two figures while the zoomed perspective provides enhanced detail for precise comparison in the critical higher-frequency range.

By including these theoretical references in the zoomed view, the visualization enables more detailed assessment of how the observed spectral characteristics align with established oceanographic theories at the scales most relevant for evaluating SWOT's performance.

In [ ]:
### 8.2.5 Reference‐slope Lines (zoom)
f_theory_z = np.linspace(1e-3, 1, 200)
ref_val_z = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-3), 
                               color='black', linestyle=':', label="Ref slope -3")
theory_line_m2_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-2), 
                               color='gray', linestyle=':', label="Ref slope -2")
handles_slopes_z.extend([theory_line_m3_z, theory_line_m2_z])

8.2.6 Secondary X-Axis (Wavelength, Zoom)¶

This section adds a secondary x-axis to the zoomed plot, displaying spatial wavelength instead of frequency, just as in the comprehensive figure. This dual representation is particularly valuable in the zoomed view, as it helps oceanographers immediately recognize the physical scales of features being examined.

The implementation uses the same approach as before:

  • Creating a secondary axis with secondary_xaxis('top')
  • Defining conversion functions between frequency and wavelength using the inverse relationship
  • Adding a clear label "Wavelength (km)"

In the zoomed view, this wavelength axis takes on additional significance:

  1. Key scale identification: The zoomed frequency range of 0.01 to 1 cycles/km corresponds to wavelengths from 1 to 100 km, encompassing:

    • The full mesoscale range (10-100 km)
    • The submesoscale range (1-10 km)
  2. Resolution assessment: The wavelength axis makes it immediately clear at what physical scales satellite measurements begin to diverge from in-situ observations:

    • If divergence occurs at 10 km, this indicates mesoscale-level resolution
    • If agreement extends to 2-5 km, this suggests submesoscale capability
  3. Oceanographic feature context: Specific wavelengths can be directly associated with known oceanographic features:

    • 50-100 km: Typical mesoscale eddies
    • 10-30 km: Small mesoscale features and frontal meanders
    • 1-10 km: Submesoscale eddies and filaments

By maintaining this dual representation in the zoomed view, the figure allows oceanographers to interpret the spectral results directly in terms of the physical scales they're familiar with, enhancing the practical utility of the analysis for understanding SWOT's capabilities in resolving features of different sizes.

In [ ]:
### 8.2.6 Secondary X-axis (wavelength, zoom)
secax_z = axz.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax_z.set_xlabel('Wavelength (km)')

8.2.7 Axes Formatting (Zoom)¶

This section applies final formatting to the zoomed figure, maintaining consistency with the comprehensive plot while adjusting the view to focus on higher frequencies. The careful formatting ensures the zoomed plot maintains scientific accuracy while highlighting the critical mesoscale and submesoscale regimes.

The code applies similar formatting as the comprehensive figure:

  1. Axis labels:

    • X-axis: "Frequency (cyc/km)" - Maintains consistent units
    • Y-axis: "PSD [cm²/cyc/km]" - Keeps the same power spectral density units
  2. Title: "Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom" - Clearly identifies this as a zoomed version of the previous plot

  3. Grid: The same subtle grid with dashed lines at 40% opacity is applied, providing reference lines without visual distraction

  4. Logarithmic scales: Both axes maintain logarithmic scaling for consistent representation of spectral characteristics

  5. X-axis limits: The key difference is here - the frequency range is set to 0.01 to 1 cycles/km, corresponding to wavelengths from 1 km to 100 km. This zoomed range:

    • Focuses on the mesoscale (10-100 km) and submesoscale (1-10 km) regimes
    • Highlights the critical transition region between these regimes
    • Emphasizes the frequency range where SWOT is expected to show improvements over conventional altimetry

This focused view provides enhanced detail in the regions of greatest interest for evaluating SWOT's performance while maintaining visual and conceptual consistency with the comprehensive figure. By zooming in on these specific scales, the plot enables more precise assessment of:

  • The exact frequency where satellite and in-situ measurements begin to diverge
  • The relative performance of SWOT versus DUACS at higher frequencies
  • The presence of scale-dependent features that might be obscured in the full-range view

These insights are crucial for quantifying SWOT's effective resolution and its ability to capture the dynamics of mesoscale and submesoscale oceanographic features.

In [ ]:
### 8.2.7 Axes Formatting (zoom)
axz.set_xlabel("Frequency (cyc/km)")
axz.set_ylabel("PSD [cm²/cyc/km]")
axz.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom")
axz.grid(True, which='both', linestyle='--', alpha=0.4)
axz.set_xscale('log')
axz.set_yscale('log')
axz.set_xlim(0.01, 1)

8.2.8 Secondary Y-Axis for Coherence (Zoom)¶

This section adds a secondary y-axis to display coherence relationships in the zoomed figure, following the same approach as in the comprehensive plot. This dual-axis visualization enables simultaneous assessment of both energy distribution and linear relationships between measurement systems, with enhanced focus on the higher-frequency range.

The secondary y-axis is configured identically to the comprehensive figure:

  • Created with axz.twinx() to share the x-axis while having an independent y-scale
  • Y-limits set to 0-2 with displayed coherence values in the 0-1 range
  • Custom tick marks at 0, 0.5, and 1 with appropriate labels
  • Clearly labeled as "Coherence (0-1)"

The same three coherence relationships are plotted with consistent dark colors:

  1. MVP-SWOT coherence (dark blue):

    • Primary validation metric showing agreement between in-situ and SWOT measurements
    • Particularly important in the zoomed view to assess at what scale coherence begins to decline
  2. MVP-DUACS coherence (dark green):

    • Baseline comparison showing agreement between in-situ and conventional altimetry
    • Allows direct evaluation of whether SWOT maintains higher coherence than DUACS at smaller scales
  3. SWOT-DUACS coherence (dark red):

    • Reveals similarities and differences between the two satellite products
    • Helps identify where SWOT diverges from conventional altimetry approaches

In the zoomed view, these coherence curves take on enhanced significance because:

  1. Resolution limits: The frequency at which coherence drops below a threshold value (e.g., 0.6) can define the effective resolution of satellite measurements

  2. Relative performance: Differences in coherence between SWOT and DUACS become more apparent, particularly at higher frequencies

  3. Noise identification: Rapid coherence drops may indicate the scale at which noise begins to dominate the satellite measurements

The zoomed perspective allows more precise identification of these transition points, providing crucial insight into SWOT's capability to resolve finer-scale features compared to conventional altimetry. By examining where coherence begins to decline between MVP and satellite measurements, we can quantify SWOT's effective resolution in capturing real oceanographic signals.

In [ ]:
### 8.2.8 Secondary Y-axis for Coherence (zoom)
ax2z = axz.twinx()
ax2z.set_ylim(0, 2)
ax2z.set_yticks([0, 0.5, 1, 1.5, 2])
ax2z.set_yticklabels(['0', '0.5', '1', '', ''])
ax2z.set_ylabel('Coherence (0-1)')

handles_coherence_z = []

# MVP–SWOT coherence (zoom)
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_mvp_swot_z, = ax2z.semilogx(freq_coh_ens_mvp_swot[valid_coh_z],
                                                  coh_ens_mvp_swot[valid_coh_z],
                                                  color=coherence_colors[0],
                                                  label='Coherence MVP-SWOT',
                                                  zorder=3)
        handles_coherence_z.append(line_coherence_mvp_swot_z)

# MVP–DUACS coherence (zoom)
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_mvp_duacs_z, = ax2z.semilogx(freq_coh_ens_mvp_duacs[valid_coh_z],
                                                   coh_ens_mvp_duacs[valid_coh_z],
                                                   color=coherence_colors[1],
                                                   label='Coherence MVP-DUACS',
                                                   zorder=3)
        handles_coherence_z.append(line_coherence_mvp_duacs_z)

# SWOT–DUACS coherence (zoom)
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_swot_duacs_z, = ax2z.semilogx(freq_coh_ens_swot_duacs[valid_coh_z],
                                                    coh_ens_swot_duacs[valid_coh_z],
                                                    color=coherence_colors[2],
                                                    label='Coherence SWOT-DUACS',
                                                    zorder=3)
        handles_coherence_z.append(line_coherence_swot_duacs_z)

8.2.9 Legend (Zoom)¶

This final section completes the zoomed figure by adding a comprehensive legend that identifies all visual elements, following the same approach as in the full-range plot. This ensures consistency between the two figures while providing clear reference for interpreting the zoomed spectral results.

The legend construction mirrors the previous approach:

  1. Element collection: The code gathers handles from all visual components:

    • Background shading patches for different scale regimes
    • PSD curves for MVP, SWOT, and DUACS
    • Regression lines showing spectral slopes
    • Coherence curves for different dataset pairs
  2. Legend positioning:

    • Located below the figure (loc='lower center')
    • Centered horizontally (bbox_to_anchor=(0.5, -0.25))
    • Arranged in three columns (ncol=3) for compact presentation
    • Separated from the plot area with adequate space (bottom=0.3)
  3. Layout optimization:

    • tight_layout() maximizes use of figure space
    • subplots_adjust(bottom=0.3) ensures room for the legend

This consistent approach to legend construction ensures that readers can easily interpret both figures using the same visual reference system. The zoomed figure is then displayed with plt.show(), completing the visualization process.

Together, the comprehensive and zoomed figures provide a complete spectral analysis of SWOT performance:

  1. The comprehensive figure offers a broad overview across the full range of spatial scales
  2. The zoomed figure provides detailed insight into the critical mesoscale-to-submesoscale transition region

This dual-visualization approach enables both qualitative assessment and quantitative analysis of how well SWOT captures oceanographic features across different spatial scales. The zoomed perspective, in particular, allows precise identification of:

  • The scale-dependent agreement between satellite and in-situ measurements
  • The relative performance of SWOT compared to conventional altimetry
  • The effective resolution limits of different measurement systems

These insights are crucial for understanding SWOT's capabilities and limitations in resolving fine-scale oceanographic features, with direct implications for oceanographic research and operational applications.

In [ ]:
### 8.2.9 Legend (zoom)
patches_fondo_z = [obj for obj in axz.patches]
handles_curves_z = [h for h in [line_mvp_z, line_swot_z, line_duacs_z] if h is not None]
handles_all_zoom = patches_fondo_z + handles_curves_z + handles_slopes_z + handles_coherence_z
labels_all_zoom = [obj.get_label() for obj in handles_all_zoom]

fig_zoom.legend(handles_all_zoom, labels_all_zoom, loc='lower center',
                bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)

plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()

Integral Code and Plots (IV)¶

In [10]:
# 1. IMPORT LIBRARIES
## 1.1 Core scientific stack
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.signal as signal
import warnings
import gsw
import scipy.io as sio
import geopy.distance
from datetime import datetime, timedelta
import glob, os
from scipy.interpolate import griddata
from tqdm import tqdm
import xarray as xr

warnings.filterwarnings("ignore", category=RuntimeWarning)

# 2. CONFIGURATION PARAMETERS
MIN_SEGMENT_LENGTH = 10   # minimum points to retain a segment

# 3. DATE UTILITIES
## 3.1 MATLAB ordinal‐date conversion
def read_mat_datetime(date_array):
    """
    Convert MATLAB ordinal dates in 'date_array' to Python datetime objects.
    """
    dates_list = []
    array_flat = date_array.reshape(-1)
    for val_ in array_flat:
        vv = val_
        while isinstance(vv, (list, tuple, np.ndarray)):
            if isinstance(vv, np.ndarray):
                vv = vv.item() if vv.size == 1 else vv.ravel()[0]
            elif isinstance(vv, tuple):
                vv = vv[0] if vv else None
        if isinstance(vv, (int, float)):
            frac = vv % 1
            try:
                dt_ = datetime.fromordinal(int(vv)) + timedelta(days=frac) - timedelta(days=366)
                dates_list.append(dt_)
            except:
                pass
    return dates_list

# 4. DATA LOADING FUNCTIONS
## 4.1 Generic SSH/U/V reader
def loadsshuv(fname, rep, varn, **kwargs):
    """
    Load longitude, latitude, SSH, velocity, and ADT fields from a NetCDF file.
    """
    if 'type' in kwargs and kwargs['type'] == 'swot':
        filei2 = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei2)
        if 'area' in kwargs:
            area = kwargs['area']
            selection = ((ds.longitude > area[0]) &
                         (ds.longitude < area[2]) &
                         (ds.latitude > area[1]) &
                         (ds.latitude < area[3]))
            ds = ds.where(selection)
        lons = ds.longitude.to_masked_array()
        lats = ds.latitude.to_masked_array()
        us   = ds.ugos.to_masked_array()
        vs   = ds.vgos.to_masked_array()
        ssha = ds.ssha_filtered.to_masked_array()
        mss  = ds.mss.to_masked_array()
        mdt  = ds.mdt.to_masked_array()
        mask_row = np.all(np.isnan(us), axis=1)
        u    = us[~mask_row, :]
        v    = vs[~mask_row, :]
        lon  = lons[~mask_row, :]
        lat  = lats[~mask_row, :]
        ssha = ssha[~mask_row, :]
        mss  = mss[~mask_row, :]
        mdt  = mdt[~mask_row, :]
        adt = ssha + mdt
    else:
        filei = glob.glob(rep + fname)[0]
        ds = xr.open_dataset(filei)
        lon  = ds[varn['longitude']][:]
        lat  = ds[varn['latitude']][:]
        ssha = ds[varn['ssh']][:]
        adt  = ds['adt'][:]
        u    = ds[varn['u']][:]
        v    = ds[varn['v']][:]
    return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
            'ssha': ssha, 'adt': adt}

## 4.2 SWOT colocation
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
    """
    Load SWOT data and interpolate ADT onto MVP points.
    """
    domain = [np.floor(np.min(mvp['lon'])) - 0.5,
              np.floor(np.min(mvp['lat'])) - 0.5,
              np.ceil(np.max(mvp['lon'])) + 0.5,
              np.ceil(np.max(mvp['lat'])) + 0.5]
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
    fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
                      dayv_dt.strftime('%Y%m%d') + '*_*.nc')
    if not fname:
        print(f"No SWOT file found for date {dayv}")
        return None
    swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
    points = np.column_stack((swot['lon'].flatten(), swot['lat'].flatten()))
    adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtc)

## 4.3 DUACS colocation
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
    """
    Load DUACS data and interpolate ADT onto MVP points.
    """
    dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
    varn = {'longitude': 'longitude', 'latitude': 'latitude',
            'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
    fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
                      dayv_dt.strftime('%Y%m%d') + '_*.nc')
    if not fname:
        print(f"No DUACS file found for date {dayv}")
        return None
    duacs = loadsshuv(fname[0], '', varn)
    X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
    points = np.column_stack((X.flatten(), Y.flatten()))
    adtd = griddata(points, duacs['adt'].values.flatten(), (mvp['lon'], mvp['lat']), method='linear')
    return np.squeeze(adtd)

# 5. SPECTRAL ANALYSIS UTILITIES
## 5.1 Segmentation and power spectral density
### 5.1.1 NaN‐aware splitting
def split_on_nan(data_array, max_consecutive_nans=4):
    """
    Split a 1-D array into continuous segments separated by ≥ max_consecutive_nans NaNs.
    """
    segments = []
    n = len(data_array)
    start_idx = 0
    nan_count = 0
    for i in range(n):
        nan_count = nan_count + 1 if np.isnan(data_array[i]) else 0
        if nan_count >= max_consecutive_nans:
            end_idx = i - max_consecutive_nans
            if end_idx > start_idx:
                segm = data_array[start_idx:end_idx]
                if len(segm) > 1:
                    segments.append(segm)
            start_idx = i
            nan_count = 0
    if start_idx < n:
        segm = data_array[start_idx:]
        if len(segm) > 1:
            segments.append(segm)
    return segments

### 5.1.2 Welch PSD
def welch_psd_1D(data_array, fs=1.0):
    """
    Compute 1-D PSD via Welch’s method (Hann window).
    """
    n = len(data_array)
    if n < 8:
        return None, None
    nperseg = min(int(n // 2), 256)
    freq, psd = signal.welch(data_array, fs=fs, nperseg=nperseg,
                             noverlap=nperseg // 2, window='hann',
                             detrend='constant')
    return freq, psd

### 5.1.3 PSD for segmented data
def compute_segments_psd(data_array, max_nan=4, fs=1.0, min_length=10):
    """
    Compute Welch PSD for each sufficiently long segment of 'data_array'.
    """
    segments = split_on_nan(data_array, max_nan)
    results = []
    for seg in segments:
        seg_clean = seg[~np.isnan(seg)]
        if len(seg_clean) < min_length:
            continue
        f, p = welch_psd_1D(seg_clean, fs)
        if f is not None and p is not None:
            results.append((f, p))
    return results

### 5.1.4 Log–log regression of PSD
def loglog_regression(freq, psd):
    """
    Fit log10(PSD) = m log10(freq) + c and return slope m, intercept c.
    """
    valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
    if np.sum(valid) < 2:
        return np.nan, np.nan
    m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
    return m, c

## 5.2 Spectral coherence
### 5.2.1 Paired NaN‐aware splitting
def split_on_nan_pair(data1, data2, max_consecutive_nans=4):
    """
    Split paired arrays into segments, honouring NaNs in either input.
    """
    data1 = np.asarray(data1)
    data2 = np.asarray(data2)
    combined_nan = np.isnan(data1) | np.isnan(data2)

    segments = []
    n = len(data1)
    start_idx = 0
    nan_count = 0
    for i in range(n):
        nan_count = nan_count + 1 if combined_nan[i] else 0
        if nan_count >= max_consecutive_nans:
            end_idx = i - max_consecutive_nans
            if end_idx > start_idx:
                segm1 = data1[start_idx:end_idx]
                segm2 = data2[start_idx:end_idx]
                if len(segm1) > 1:
                    segments.append((segm1, segm2))
            start_idx = i
            nan_count = 0
    if start_idx < n:
        segm1 = data1[start_idx:]
        segm2 = data2[start_idx:]
        if len(segm1) > 1:
            segments.append((segm1, segm2))
    return segments

### 5.2.2 Welch coherence
def welch_coherence_1D(signal1, signal2, fs=1.0):
    """
    Compute Welch spectral coherence between two signals.
    """
    n = len(signal1)
    if n < 8:
        return None, None
    nperseg = min(int(n // 2), 256)
    freq, coh = signal.coherence(signal1, signal2, fs=fs, window='hann',
                                 nperseg=nperseg, noverlap=nperseg // 2,
                                 detrend='constant')
    return freq, coh

### 5.2.3 Coherence for segmented data
def compute_segments_coherence(data1, data2, max_nan=4, fs=1.0, min_length=10):
    """
    Compute Welch coherence for each paired segment of (data1, data2).
    """
    segments = split_on_nan_pair(data1, data2, max_nan)
    results = []
    for seg1, seg2 in segments:
        mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
        seg1_clean = seg1[mask_valid]
        seg2_clean = seg2[mask_valid]
        if len(seg1_clean) < min_length:
            continue
        f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
        if f is not None and co is not None:
            results.append((f, co))
    return results

# 6. LOOP OVER PROFILING LINES
## 6.1 Initialisation of accumulators
PLs = np.arange(5, 25)                       # profiling lines 5 – 24

# Power-spectral-density (PSD) containers
global_freq_mvp,   global_psd_mvp   = [], []
global_freq_swot,  global_psd_swot  = [], []
global_freq_duacs, global_psd_duacs = [], []

# Spectral-coherence containers
global_freq_coh_mvp_swot,  global_coh_mvp_swot   = [], []
global_freq_coh_mvp_duacs, global_coh_mvp_duacs  = [], []
global_freq_coh_swot_duacs, global_coh_swot_duacs = [], []

## 6.2 Iterative processing
for PL in PLs:
    print(f"\n[SPECTRAL ANALYSIS] – PL{PL} in progress...")
    try:
        # ### 6.2.1 MVP data loading and dynamic-height computation
        mat_file = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
        date_array = mat_file['metadata']['date']
        dates_mvp = read_mat_datetime(date_array)
        if len(dates_mvp) == 0:
            raise ValueError(f"No valid date in PL{PL}")

        mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
        lon_mvp = mat['lon']
        lat_mvp = mat['lat']
        p_mvp   = mat['pres']
        t_mvp   = mat['temp']
        SP_mvp  = mat['sal']

        SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
        CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
        g_ = 9.81
        dh_list = []
        for i in range(SA.shape[0]):
            valid_ = ~np.isnan(SA[i])
            if np.sum(valid_) == 0:
                dh_list.append(np.nan)
                continue
            p_ref = np.nanmax(p_mvp[i][valid_])                       # p_ref = p_max
            if np.isnan(p_ref):
                dh_list.append(np.nan)
                continue
            dh_i = gsw.geo_strf_dyn_height(SA[i][valid_], CT[i][valid_],
                                           p_mvp[i][valid_], p_ref=p_ref)[0] / g_
            if isinstance(dh_i, np.ndarray):
                dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
            dh_list.append(dh_i)
        dh_array = np.array(dh_list, dtype=float)
        if np.all(np.isnan(dh_array)):
            raise ValueError(f"PL{PL}: All dh are NaN")

        dhc = dh_array - np.nanmean(dh_array)                         # centred anomaly

        # ### 6.2.2 MVP transect dataframe and distance along track
        n_profiles = SA.shape[0]
        if lon_mvp.shape[1] < n_profiles:
            raise ValueError(f"PL{PL}: lon_mvp dimension mismatch")
        df_mvp_raw = pd.DataFrame({
            'dh':  dh_array,
            'dhc': dhc,
            'lon': lon_mvp[0, :n_profiles],
            'lat': lat_mvp[0, :n_profiles]
        })
        dist_along = np.zeros(n_profiles)
        for i in range(1, n_profiles):
            dist_along[i] = dist_along[i-1] + geopy.distance.distance(
                (df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
                (df_mvp_raw['lat'].values[i],   df_mvp_raw['lon'].values[i])
            ).km
        df_mvp_raw['dist'] = dist_along

        # ### 6.2.3 Regular 1 km re-sampling (no smoothing)
        step_km  = 1
        dist_max = df_mvp_raw['dist'].iloc[-1]
        dist_reg = np.arange(0, dist_max + step_km, step_km)
        lon_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
        lat_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
        dh_reg   = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
        dhc_reg  = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
        df_mvp = pd.DataFrame({
            'dist': dist_reg,
            'lon':  lon_reg,
            'lat':  lat_reg,
            'dh':   dh_reg,
            'dhc':  dhc_reg
        })

        # ### 6.2.4 External ADT colocation (SWOT & DUACS)
        dayv_str = dates_mvp[0].strftime('%Y-%m-%d')
        adt_swot = load_SWOT(df_mvp, dayv_str,
                             '/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/', '')
        adt_swot = np.full(len(df_mvp), np.nan) if adt_swot is None else adt_swot - np.nanmean(adt_swot)

        adt_duacs = load_DUACS(df_mvp, dayv_str,
                               '/home/bessiere/Documents/DUACS/', '')
        adt_duacs = np.full(len(df_mvp), np.nan) if adt_duacs is None else adt_duacs - np.nanmean(adt_duacs)

        # ### 6.2.5 PSD computation and storage
        segments_psd_mvp = compute_segments_psd(df_mvp['dhc'].values, max_nan=4,
                                                fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_mvp:
            global_freq_mvp.append(f)
            global_psd_mvp.append(p)

        segments_psd_swot = compute_segments_psd(adt_swot, max_nan=4,
                                                 fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_swot:
            global_freq_swot.append(f)
            global_psd_swot.append(p)

        segments_psd_duacs = compute_segments_psd(adt_duacs, max_nan=4,
                                                  fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, p in segments_psd_duacs:
            global_freq_duacs.append(f)
            global_psd_duacs.append(p)

        # ### 6.2.6 Coherence computation and storage
        segments_coh_mvp_swot = compute_segments_coherence(df_mvp['dhc'].values,
                                                           adt_swot, max_nan=4,
                                                           fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_mvp_swot:
            global_freq_coh_mvp_swot.append(f)
            global_coh_mvp_swot.append(co)

        segments_coh_mvp_duacs = compute_segments_coherence(df_mvp['dhc'].values,
                                                            adt_duacs, max_nan=4,
                                                            fs=1.0, min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_mvp_duacs:
            global_freq_coh_mvp_duacs.append(f)
            global_coh_mvp_duacs.append(co)

        segments_coh_swot_duacs = compute_segments_coherence(adt_swot, adt_duacs,
                                                             max_nan=4, fs=1.0,
                                                             min_length=MIN_SEGMENT_LENGTH)
        for f, co in segments_coh_swot_duacs:
            global_freq_coh_swot_duacs.append(f)
            global_coh_swot_duacs.append(co)

    except Exception as e:
        print(f"Error in PL{PL}: {e}")
        continue

# 7. ENSEMBLE AVERAGING 
## 7.1 Ensemble-average helpers
def ensemble_average_psd(freq_list, psd_list):
    """
    Return common-grid frequency and ensemble-mean PSD.
    """
    if len(freq_list) == 0:
        return None, None
    freq_common = freq_list[0][freq_list[0] > 0]
    psd_interp_all = [psd_list[0][freq_list[0] > 0]]
    for freq_, psd_ in zip(freq_list[1:], psd_list[1:]):
        valid = freq_ > 0
        psd_interp = psd_ if np.array_equal(freq_[valid], freq_common) else \
                     np.interp(freq_common, freq_[valid], psd_[valid],
                               left=np.nan, right=np.nan)
        psd_interp_all.append(psd_interp)
    return freq_common, np.nanmean(np.array(psd_interp_all), axis=0)

def ensemble_average_coherence(freq_list, coh_list):
    """
    Return common-grid frequency and ensemble-mean coherence.
    """
    if len(freq_list) == 0:
        return None, None
    freq_common = freq_list[0][freq_list[0] > 0]
    coh_interp_all = [coh_list[0][freq_list[0] > 0]]
    for freq_, coh_ in zip(freq_list[1:], coh_list[1:]):
        valid = freq_ > 0
        coh_interp = coh_ if np.array_equal(freq_[valid], freq_common) else \
                     np.interp(freq_common, freq_[valid], coh_[valid],
                               left=np.nan, right=np.nan)
        coh_interp_all.append(coh_interp)
    return freq_common, np.nanmean(np.array(coh_interp_all), axis=0)

## 7.2 Ensemble PSD and coherence computation
freq_mvp_ens,   psd_mvp_ens   = ensemble_average_psd(global_freq_mvp,   global_psd_mvp)
freq_swot_ens,  psd_swot_ens  = ensemble_average_psd(global_freq_swot,  global_psd_swot)
freq_duacs_ens, psd_duacs_ens = ensemble_average_psd(global_freq_duacs, global_psd_duacs)

freq_coh_ens_mvp_swot,  coh_ens_mvp_swot  = ensemble_average_coherence(global_freq_coh_mvp_swot,  global_coh_mvp_swot)
freq_coh_ens_mvp_duacs, coh_ens_mvp_duacs = ensemble_average_coherence(global_freq_coh_mvp_duacs, global_coh_mvp_duacs)
freq_coh_ens_swot_duacs, coh_ens_swot_duacs = ensemble_average_coherence(global_freq_coh_swot_duacs, global_coh_swot_duacs)

# Convert m² (cyc km)⁻¹ → cm² (cyc km)⁻¹
if psd_mvp_ens  is not None: psd_mvp_ens  *= 1.0e4
if psd_swot_ens is not None: psd_swot_ens *= 1.0e4
if psd_duacs_ens is not None: psd_duacs_ens *= 1.0e4

# 8. VISUALISATION
## 8.1 Complete Figure – Ensemble PSD and Coherence

### 8.1.1 Figure and Axes
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)

### 8.1.2 Spatial‐scale Background
ax.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
ax.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
ax.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')

### 8.1.3 Helper for PSD Lines
def plot_psd_line(freq, psd, color, label):
    if freq is not None and psd is not None:
        valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
        if np.sum(valid) < 2:
            return None
        line, = ax.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
        return line
    return None

colors = plt.cm.viridis(np.linspace(0.3, 0.7, 3))  # colour map sampling
color_mvp, color_swot, color_duacs = colors[0], colors[1], colors[2]

### 8.1.4 PSD Curves
line_mvp   = plot_psd_line(freq_mvp_ens, psd_mvp_ens, color_mvp,   'MVP PSD')
line_swot  = plot_psd_line(freq_swot_ens, psd_swot_ens, color_swot,  'SWOT PSD')
line_duacs = plot_psd_line(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')

### 8.1.5 Log–log Regression Lines
handles_slopes = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
    m_mvp, c_mvp = loglog_regression(freq_mvp_ens, psd_mvp_ens)
    if not np.isnan(m_mvp):
        reg_line_mvp, = ax.loglog(freq_mvp_ens, 10**c_mvp * freq_mvp_ens**m_mvp, 
                                  color=color_mvp, linestyle='--',
                                  label=f"MVP slope = {m_mvp:.2f}")
        handles_slopes.append(reg_line_mvp)

if freq_swot_ens is not None and psd_swot_ens is not None:
    m_swot, c_swot = loglog_regression(freq_swot_ens, psd_swot_ens)
    if not np.isnan(m_swot):
        reg_line_swot, = ax.loglog(freq_swot_ens, 10**c_swot * freq_swot_ens**m_swot, 
                                   color=color_swot, linestyle='--',
                                   label=f"SWOT slope = {m_swot:.2f}")
        handles_slopes.append(reg_line_swot)

if freq_duacs_ens is not None and psd_duacs_ens is not None:
    m_duacs, c_duacs = loglog_regression(freq_duacs_ens, psd_duacs_ens)
    if not np.isnan(m_duacs):
        reg_line_duacs, = ax.loglog(freq_duacs_ens, 10**c_duacs * freq_duacs_ens**m_duacs, 
                                    color=color_duacs, linestyle='--',
                                    label=f"DUACS slope = {m_duacs:.2f}")
        handles_slopes.append(reg_line_duacs)

### 8.1.6 Reference‐slope Lines
f_theory = np.linspace(1e-3, 1, 200)
ref_val = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-3), 
                            color='black', linestyle=':', label="Ref slope -3")
theory_line_m2, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-2), 
                            color='gray', linestyle=':', label="Ref slope -2")
handles_slopes.extend([theory_line_m3, theory_line_m2])

### 8.1.7 Secondary X-axis (Wavelength)
secax = ax.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax.set_xlabel('Wavelength (km)')

### 8.1.8 Axes Formatting
ax.set_xlabel("Frequency (cyc/km)")
ax.set_ylabel("PSD [cm²/cyc/km]")
ax.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS)")
ax.grid(True, which='both', linestyle='--', alpha=0.4)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlim(1e-3, 1)

### 8.1.9 Secondary Y-axis for Coherence
ax2 = ax.twinx()
ax2.set_ylim(0, 2)
ax2.set_yticks([0, 0.5, 1, 1.5, 2])
ax2.set_yticklabels(['0', '0.5', '1', '', ''])
ax2.set_ylabel('Coherence (0-1)')

coherence_colors = ['darkblue', 'darkgreen', 'darkred']
handles_coherence = []

# MVP–SWOT coherence
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
    valid_coh = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_mvp_swot, = ax2.semilogx(freq_coh_ens_mvp_swot[valid_coh],
                                               coh_ens_mvp_swot[valid_coh],
                                               color=coherence_colors[0],
                                               label='Coherence MVP-SWOT',
                                               zorder=3)
        handles_coherence.append(line_coherence_mvp_swot)

# MVP–DUACS coherence
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
    valid_coh = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_mvp_duacs, = ax2.semilogx(freq_coh_ens_mvp_duacs[valid_coh],
                                                coh_ens_mvp_duacs[valid_coh],
                                                color=coherence_colors[1],
                                                label='Coherence MVP-DUACS',
                                                zorder=3)
        handles_coherence.append(line_coherence_mvp_duacs)

# SWOT–DUACS coherence
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
    valid_coh = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
    if np.sum(valid_coh) > 1:
        line_coherence_swot_duacs, = ax2.semilogx(freq_coh_ens_swot_duacs[valid_coh],
                                                 coh_ens_swot_duacs[valid_coh],
                                                 color=coherence_colors[2],
                                                 label='Coherence SWOT-DUACS',
                                                 zorder=3)
        handles_coherence.append(line_coherence_swot_duacs)

### 8.1.10 Legend (all elements)
handles_curves = [h for h in [line_mvp, line_swot, line_duacs] if h is not None]
patches_fondo = [obj for obj in ax.patches]   # shading patches
handles_all = patches_fondo + handles_curves + handles_slopes + handles_coherence
labels_all = [obj.get_label() for obj in handles_all]

fig.legend(handles_all, labels_all, loc='lower center',
           bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)

plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()

## 8.2 Zoomed Figure – 0.01 ≤ f ≤ 1 cyc km⁻¹

### 8.2.1 Figure and Axes
fig_zoom = plt.figure(figsize=(8,6))
axz = fig_zoom.add_subplot(111)

### 8.2.2 Spatial‐scale Background (zoom)
axz.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
axz.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
axz.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')

### 8.2.3 Helper for PSD Lines (zoom)
def plot_psd_line_zoom(freq, psd, color, label):
    if freq is not None and psd is not None:
        valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
        if np.sum(valid) < 2:
            return None
        line, = axz.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
        return line
    return None

line_mvp_z   = plot_psd_line_zoom(freq_mvp_ens, psd_mvp_ens, color_mvp,   'MVP PSD')
line_swot_z  = plot_psd_line_zoom(freq_swot_ens, psd_swot_ens, color_swot,  'SWOT PSD')
line_duacs_z = plot_psd_line_zoom(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')

### 8.2.4 Regression Lines (zoom)
handles_slopes_z = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
    m_mvp_z, c_mvp_z = loglog_regression(freq_mvp_ens, psd_mvp_ens)
    if not np.isnan(m_mvp_z):
        reg_line_mvp_z, = axz.loglog(freq_mvp_ens, 10**c_mvp_z * freq_mvp_ens**m_mvp_z, 
                                     color=color_mvp, linestyle='--',
                                     label=f"MVP slope = {m_mvp_z:.2f}")
        handles_slopes_z.append(reg_line_mvp_z)

if freq_swot_ens is not None and psd_swot_ens is not None:
    m_swot_z, c_swot_z = loglog_regression(freq_swot_ens, psd_swot_ens)
    if not np.isnan(m_swot_z):
        reg_line_swot_z, = axz.loglog(freq_swot_ens, 10**c_swot_z * freq_swot_ens**m_swot_z, 
                                      color=color_swot, linestyle='--',
                                      label=f"SWOT slope = {m_swot_z:.2f}")
        handles_slopes_z.append(reg_line_swot_z)

if freq_duacs_ens is not None and psd_duacs_ens is not None:
    m_duacs_z, c_duacs_z = loglog_regression(freq_duacs_ens, psd_duacs_ens)
    if not np.isnan(m_duacs_z):
        reg_line_duacs_z, = axz.loglog(freq_duacs_ens, 10**c_duacs_z * freq_duacs_ens**m_duacs_z, 
                                       color=color_duacs, linestyle='--',
                                       label=f"DUACS slope = {m_duacs_z:.2f}")
        handles_slopes_z.append(reg_line_duacs_z)

### 8.2.5 Reference‐slope Lines (zoom)
f_theory_z = np.linspace(1e-3, 1, 200)
ref_val_z = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-3), 
                               color='black', linestyle=':', label="Ref slope -3")
theory_line_m2_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-2), 
                               color='gray', linestyle=':', label="Ref slope -2")
handles_slopes_z.extend([theory_line_m3_z, theory_line_m2_z])

### 8.2.6 Secondary X-axis (wavelength, zoom)
secax_z = axz.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax_z.set_xlabel('Wavelength (km)')

### 8.2.7 Axes Formatting (zoom)
axz.set_xlabel("Frequency (cyc/km)")
axz.set_ylabel("PSD [cm²/cyc/km]")
axz.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom")
axz.grid(True, which='both', linestyle='--', alpha=0.4)
axz.set_xscale('log')
axz.set_yscale('log')
axz.set_xlim(0.01, 1)

### 8.2.8 Secondary Y-axis for Coherence (zoom)
ax2z = axz.twinx()
ax2z.set_ylim(0, 2)
ax2z.set_yticks([0, 0.5, 1, 1.5, 2])
ax2z.set_yticklabels(['0', '0.5', '1', '', ''])
ax2z.set_ylabel('Coherence (0-1)')

handles_coherence_z = []

# MVP–SWOT coherence (zoom)
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_mvp_swot_z, = ax2z.semilogx(freq_coh_ens_mvp_swot[valid_coh_z],
                                                  coh_ens_mvp_swot[valid_coh_z],
                                                  color=coherence_colors[0],
                                                  label='Coherence MVP-SWOT',
                                                  zorder=3)
        handles_coherence_z.append(line_coherence_mvp_swot_z)

# MVP–DUACS coherence (zoom)
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_mvp_duacs_z, = ax2z.semilogx(freq_coh_ens_mvp_duacs[valid_coh_z],
                                                   coh_ens_mvp_duacs[valid_coh_z],
                                                   color=coherence_colors[1],
                                                   label='Coherence MVP-DUACS',
                                                   zorder=3)
        handles_coherence_z.append(line_coherence_mvp_duacs_z)

# SWOT–DUACS coherence (zoom)
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
    valid_coh_z = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
    if np.sum(valid_coh_z) > 1:
        line_coherence_swot_duacs_z, = ax2z.semilogx(freq_coh_ens_swot_duacs[valid_coh_z],
                                                    coh_ens_swot_duacs[valid_coh_z],
                                                    color=coherence_colors[2],
                                                    label='Coherence SWOT-DUACS',
                                                    zorder=3)
        handles_coherence_z.append(line_coherence_swot_duacs_z)

### 8.2.9 Legend (zoom)
patches_fondo_z = [obj for obj in axz.patches]
handles_curves_z = [h for h in [line_mvp_z, line_swot_z, line_duacs_z] if h is not None]
handles_all_zoom = patches_fondo_z + handles_curves_z + handles_slopes_z + handles_coherence_z
labels_all_zoom = [obj.get_label() for obj in handles_all_zoom]

fig_zoom.legend(handles_all_zoom, labels_all_zoom, loc='lower center',
                bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)

plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()
[SPECTRAL ANALYSIS] - PL5 in progress...

[SPECTRAL ANALYSIS] - PL6 in progress...

[SPECTRAL ANALYSIS] - PL7 in progress...

[SPECTRAL ANALYSIS] - PL8 in progress...

[SPECTRAL ANALYSIS] - PL9 in progress...

[SPECTRAL ANALYSIS] - PL10 in progress...

[SPECTRAL ANALYSIS] - PL11 in progress...

[SPECTRAL ANALYSIS] - PL12 in progress...

[SPECTRAL ANALYSIS] - PL13 in progress...

[SPECTRAL ANALYSIS] - PL14 in progress...

[SPECTRAL ANALYSIS] - PL15 in progress...

[SPECTRAL ANALYSIS] - PL16 in progress...

[SPECTRAL ANALYSIS] - PL17 in progress...

[SPECTRAL ANALYSIS] - PL18 in progress...

[SPECTRAL ANALYSIS] - PL19 in progress...

[SPECTRAL ANALYSIS] - PL20 in progress...

[SPECTRAL ANALYSIS] - PL21 in progress...

[SPECTRAL ANALYSIS] - PL22 in progress...

[SPECTRAL ANALYSIS] - PL23 in progress...

[SPECTRAL ANALYSIS] - PL24 in progress...